a1 British Antarctic Survey, NERC, High Cross, Madingley Road, Cambridge CB3 OET, UK
Seven independent clone libraries were constructed to study the biodiversity of the bacterioplankton in the surface waters around Southern Thule, South Sandwich Islands, in order to identify the species present, to determine the sample effort required to estimate the total diversity, and to determine whether the surface waters around Southern Thule represented a highly specialized local anomaly or a subset of the marine meta-community. In total, 672 clones generated 629 useable sequences. These 629 clones matched 278 different sequences deposited in the 16S rDNA sequence databases. The majority of the clones were related to marine microorganisms, many of which had been previously detected in permanently cold Arctic and Antarctic marine environments. Each clone library generated an average of 35.8 new sequence matches. 346 clones covered two-thirds of the total estimated diversity, while 438 clones covered three-quarters of the total estimated diversity. Above this number, the coverage tended to stabilize and a relatively large number of additional clones were required to improve coverage significantly, increasing at the rate of about one new sequence match per 100 new clones. Comparing the different clone libraries, eight matches occurred in each of the seven libraries, whilst fifty-five occurred in only one, suggesting that there might be a relatively small number of common dominant ubiquitous species, with a much larger underlying diversity or ‘seed bank’ from which this dominant diversity is drawn. This study suggests that the dominant bacterioplankton in the surface waters around Southern Thule represent a subset of the marine meta-community, whilst sub-dominant diversity appears to be a highly specialized local anomaly.
(Received July 19 2007)
(Accepted January 29 2008)
This publication is dedicated to the memory of Dr Helen R. Wilcock for whom to visit Antarctica was a lifelong dream.
List of Figures and Tables
Fig. 1. Rate of new sequence acquisition.
Fig. 2. Percentage new sequence identified per clone library.
Fig. 3. Figure rank abundance curves.
Table I. Closest BLAST matches to specific identifiable components of the bacterioplankton.
Table II. Number of sequences obtained in each independent clone library.
Table III. Estimated clone library coverage.
Table IV. Sorensen coefficient of similarity between samples.
Table V. Chao estimation of total species richness for each of the clone libraries.
In a recent study, which included an Arctic marine sample, Pommier et al. (2007) investigated global patterns of diversity and community structure in the marine bacterioplankton using clone libraries of 16S rRNA genes. Remarkably, they found that the global marine bacterioplankton community showed a high degree of endemism, which also included a few cosmopolitan species. Here, a large clone library was constructed to study the bacterioplankton diversity in the surface waters around Southern Thule in the South Sandwich Islands of the Scotia Arc, Southern Ocean. Such a study results from increased interest in the microbial ecology of the Southern Ocean, following estimates of a reduction in its capacity to act as a sink for atmospheric CO2, and particularly in view of a potential saturation due to recent climate change (Le Quéré et al. 2007).
The first stage in the analysis of a microbial community within a given ecosystem is to gain some understanding of biodiversity or species richness, a count of the number of different species in a given area. No two naturally assembled microbial communities appear to be the same and, in practical terms, physically identical environments will have different species compositions if they are formed at random from very large meta-communities (Curtis & Sloan 2004). However, due to their small size, restricted morphological differences and enormous population densities this has often proved difficult for the bacteria. It is a commonly held view that the diversity we observe among the bacteria is vastly exceeded by the diversity we cannot observe (Lunn et al. 2004). The two approaches most commonly used are direct culture of the organisms and indirect analysis of DNA fragments extracted from the whole population through the construction and analysis of 16S rRNA clone libraries. Whilst both are powerful techniques, they both have their limitations. For culture studies it is now widely accepted that 1% or less of the bacteria present in a given environment are amenable to culture, and so any inferences made from the cultured organisms would apply to a very small fraction of the bacterial community. In the analysis of 16S rRNA clone libraries the labour intensive nature of the process and cost has often limited sample size, where commonly 60–200 clones are analysed per sample. However, with more than 1030 bacteria on Earth that we know about (Whitman et al. 1998) and c. 160 species per ml of seawater where the entire sea might hold no more than 2 million taxa (Curtis et al. 2002, Torsvik et al. 2002), clone libraries may also be sampling only a tiny fraction of the total diversity present. Indeed, to date, no complex community has been sampled to saturation. This has been illustrated by an environmental genome shotgun sequencing of the Sargasso Sea conducted by Venter et al. (2004), which found 148 new phylotypes in the already well characterized Sargasso Sea, suggesting that increased sampling effort will reveal many new and potentially rare sequences (Curtis & Sloan 2004). Generally, in any community, the number of types of organisms observed increases with sampling effort until all types are observed (Hughes et al. 2001) and to date the inability to accurately measure bacterial diversity has been limited by the size of samples needed to give adequate sample coverage of bacterial communities (Dunbar et al. 2002). Estimating diversity from a clone library is, therefore, essentially a sampling problem (Lunn et al. 2004). There is already an extensive literature on biodiversity estimation, but even so, radically different diversities are often observed. Ultimately, microbes are too diverse to count exhaustively. Whilst it would be useful to know the actual diversity of different microbial communities, most diversity questions address how diversity changes across biotic and abiotic gradients. The answers to these questions require knowing only relative diversities among sites, over time, and under different regimes (Hughes et al. 2001). In addition, to understand microbial systems at the local level we will have to understand something of the meta-community from which it is drawn (Curtis & Sloan 2004).
When sampling an ecosystem as large as the Southern Ocean, it is imperative to know whether the samples obtained are representative of the whole ecosystem, or highly specialized local assemblages. In the last few years, there has been some debate in the literature as to whether geographic barriers (Whitaker et al. 2003) or dispersal limitation (Telford et al. 2006) can prevent a cosmopolitan distribution of microorganisms, or whether microbial species are ubiquitous (Finlay 2002). This is an interesting question for the Southern Ocean, where such constraining factors might be very subtle. Curtis & Sloan (2004) suggested that global reservoirs of diversity are an important driving force behind patterns in localized diversity. Thus, where the reservoir community is very large and relatively even, chance alone will prevent physically identical communities from having the same, or sometimes even stable communities. However, as the number of environments studied continues to grow, so patterns in the global diversity and distribution of different microorganisms are beginning to emerge, and although there is currently no consensus regarding the geographic distribution and extent of endemism in Antarctic microorganisms, increasingly clones are being identified which occur only in the Antarctic, in the Antarctic and the Arctic or extensively in psychrophilic environments. In addition, many of the clones analysed from Antarctic studies have closest matches to clones from other Antarctic studies and further, these studies might be limited only by the relative paucity of Antarctic sequences in the 16S rRNA databases.
In this study a large clone library was constructed and analysed to investigate bacterioplankton biodiversity in the surface waters around Southern Thule in the South Sandwich Islands.
Southern Thule is an island in the South Sandwich Islands (59.5°S 27.3°W), which form part of the Scotia Arc in the Southern Ocean. Surface water samples were collected in a CTD from the RRS James Clark Ross on Cruise JR144 as part of the BAS Long term Monitoring and Survey Programme. 300 litres of surface water was collected 30 m from the ocean surface. A 1 l sub-sample was taken and the bacterioplankton extracted onto sterile 0.2 µm cellulose nitrate membrane filters (47 mm diameter), and frozen (-70°C) until further analysis.
Samples were filtered through 0.2 µm cellulose nitrate filters (Sartorius, Goettingen, Germany). The filters were placed in a 30 ml sample tube (Sterilin, Feltham, Middlesex, UK) with 18 ml of the original marine water sample, which was then mixed for 3 min. The resulting suspension was centrifuged for up to 60 min at 14 000 rpm (Force 14 Microcentrifuge, Denver Instruments, Denver, CO). The pellet was resuspended in 1 ml marine water, which was then centrifuged for a further 60 min. The resultant pellet was re-suspended in 10 µl of marine water. DNA was extracted by one of two methods after re-suspension of the cells from the filter surface. The first was based upon an SDS, phenol-chloroform extraction and ethanol precipitation as described by Fuhrman et al. (1988), the second was based upon direct freeze-thaw cycling.
Enzymatic amplification of 16S rRNA gene fragments was performed on DNA extracted directly from the marine samples. For these amplifications, each PCR mixture (30 µl) contained 10 ng of extracted DNA as a template, 10 pmol of each primer and 30 µl ReddyMix PCR master mix (ABgene, Epsom, UK). Primers used were: 8F and 1492 R. Amplification reactions were performed with a Genius thermocycler (Techne, Minneapolis, USA) using the following conditions: an initial denaturation step consisting of 94°C for 5 min, 30 cycles consisting of 94°C for 45 s, 55°C for 45 s, 72°C for 70 s, and a final elongation step consisting of 72°C for 5 min. Controls containing no DNA were also used to ensure that contaminants were not being amplified.
Seven independent clone libraries were constructed from separate DNA extractions and amplifications. The PCR products were cleaned using GFX PCR clean up columns (Pharmacia, New Jersey, USA). Cleaned products were ligated into the pGEMT-Easy vector (Promega, Wisconsin, USA) and ligation mixtures were transformed into competent XL-2 Blue MRF Ultracompetent cells as recommended by the manufacturer. Transformants were screened using black/white selection on Luria agar containing S-gal/IPTG and 50 µg ml−1 ampicillin (Sigma, St Louis, USA). Putative positive colonies were transferred to 96 well plates containing 50 µl of sterile water. The cell suspensions were subjected to two freeze/thaw cycles and 1 µl aliquots were used as templates in a PCR reaction containing the M13F/M13R universal primers: M13F 5′-CGC CAG GGT TTT CCC AGT CAC GAC -3′ and M13R 5′- GAG CGG ATA ACA ATT TCA CAC AGG-3′. PCR conditions were as above, except that the annealing temperature was raised to 58.5°C. A sequence was obtained from each clone using the M13F primer and either the DYEnamic ET Dye Terminator Kit (Amersham Biocience, Buckinhamshire, UK) on a Megabace 500 capillary sequencer at the British Antarctic Survey (BAS) or sequenced commercially (Macrogen, Seoul, Korea).
Initially, clone sequences were compared with the EMBL nucleotide data library using GAPPED-BLAST (Basic Local Alignment Search Tool) searches (Altschul et al. 1997). Sequences with consistent BLAST matches within each clone library were grouped using Clustal-W. Sequences were then divided into two groups; the first group consisted of sequences with the same BLAST match in which sequences differed by < 3% (hereafter referred to as matches at the operational taxanomic unit OTU identification level) and the second group which consisted of the closest BLAST matches irrespective of sequence percentage match (hereafter referred to as matches at the BLAST identification level). The library coverage value was calculated as C = 1-(n1/N), where n1 is the number of clones which occurred only once in the library and N is the total number of sequences in the library (Good 1953). The diversity was estimated using the Simpson diversity index (Simpson 1949); 1-Σ(Pi)2, where P is the proportion of each individual species (i) observed (number of that species / total number of species) and also the Chao1 estimator (Chao 1984, Lee & Chao 1994); SChao1 = Sobs + (n12/2n2) where Sobs is the total number of observed species, n1 is the number of singletons (species captured once) and n2 is the number of doubletons (species captured twice). Similarity between independent clone libraries was determined using the Sorensen coefficient; S = 2a/(2a + b + c) where a is the number of sequences in common, b is the total in the first comparison library and c is the total in the second.
Samples taken at 30 m around Southern Thule had a consistent salinity of 33.5‰ and the temperature fluctuated between 0 and 0.11°C.
In total, 672 clones generated 629 useable sequences. These 629 clones matched 278 different sequences (at the BLAST identification level) deposited in the 16S rDNA sequence databases in 403 different ways (at the OTU identification level) (Table I). Patterns were consistent for each of the seven libraries (Figs 1 & 2). Each clone library of 96 sequences generated an average of 35.8 new matches at the BLAST identification level and 55.5 new sequences at the OTU identification level (Table II). In any single clone library, the maximum repetition was 17 from 87 (19.6%). At the BLAST identification level 629 sequences represented a cumulative 77.1% coverage of the potential biodiversity present, and gave a 1% chance of detecting a novel sequence with each new clone sequenced. At the OTU identification level, 629 sequences represented 61.2% coverage of the potential biodiversity present, and gave a 3.5% chance of detecting a novel sequence with each new clone sequenced (Table III). Comparing the independent clone libraries, eight matches occurred in each of the seven libraries, whilst 55 occurred in only one suggesting that there might be a relatively small number of common dominant organisms, with a much larger diversity. 346 clones (roughly four 96 well plates) will yield two-thirds of the total estimated diversity at the BLAST identification level, and 50% of the total estimated diversity at the OTU identification level; 438 clones (roughly five 96 well plates) will yield three-quarters of the total estimated diversity at the BLAST identification level and two-thirds of the total estimated diversity at the OTU identification level. Above this number, the coverage tended to stabilize and a relatively large number of clones were needed to improve coverage significantly, increasing at the rate of about 1 new sequence at the novel BLAST identification level per 100 or 3.5 new sequences per 100 at the OTU identification level.
Of the ten most abundant clones, those of Pelagibacter ubique and Candidatus P. ubique accounted for 112 out of 629 clones (18%). The next four most adundant sequences represent diatom plastids; Skeletonema pseudocostatum, Bacillaria paxillifer, Phaeodactylum tricornutum and Asterionellopsis glacialis. The remaining four are all bacterioplankton - Owenweeksia hongkongensis, an aerobic Gram-negative, non-fermentative, rod shaped, motile, orange pigmented bacterium, isolated from seawater (Lau et al. 2005); Sulfitobacter brevis, an aerobic, Gram-negative, pointed and budding bacterium, potentially motile, and containing storage granules, isolated from the hypersaline meromictic Ekho Lake (Vestfold Hills, Antarctica) (Labrenz et al. 2000); Phycococcus omphalius, an alpha-Proteobacterium isolated from a depth of 10 m in the Sargasso Sea (Stingl et al. 2007) and HTCC2120, an unclassified marine gamma-Proteobacterium.
Comparison of each of the clone libraries using the Sorensen coefficient is given in Table IV. All values lay between 0.42 and 0.63, where 0 represents no sequences in common and 1 represents the same sample. Random clone library sizes of between 69 and 96, therefore, produced c. 50% similar sequence information and 50% new sequences, with a mean coverage of 56% at the BLAST identification level and 36% at the OTU identification level. Rank abundance curves are given in Fig. 3. Simpson diversity calculations for each of the libraries lay between 0.86 and 0.97 (with a value of 0.96 for the full 629 sequences), were 0 represents a low diversity and 1 represents a high diversity. The Chao1 estimator predicted total species richness for each of the libraries (based upon the number of singletons and the number of doubletons) (Table V). The estimated species richness ranges from 48–162 using individual libraries (26–88.6% of the estimate using all 629 sequences). The estimated species richness using all 629 sequences was 183.
Using a relatively small number of random clones from seven independent clone libraries constructed from DNA extracted from Southern Ocean surface seawater, it was possible to identify clear patterns in prokaryote biodiversity. 346 clones covered two-thirds of the total estimated diversity, while 438 clones covered three-quarters of the total estimated diversity. However, above this number, a relatively large number of additional clones were needed to improve coverage significantly, increasing at the rate of about one new sequence match per 100 clones.
There were less than ten frequently encountered clones that occurred in each of the independent clone libraries, and these included: Pelagibacter ubique strain HTCC1002 (& Candidatus Pelagibacter ubique HTCC1062), Skeletonema pseudocostatum, Bacillaria paxillifer, Phaeodactylum tricornutum, Asterionellopsis glacialis, Owenweeksia hongkongensis, Sulfitobacter brevis, Phycococcus omphalius and marine gamma proteobacterium HTCC2120. In addition, out of 629 useable sequences, there were only 55 clones that were identified only once in a single library. Taken together, this pattern suggests that there is a relatively small number of ubiquitous dominant species, with a long tail or ‘seed bank’ from which this dominant diversity is drawn. This is supported by the apparent decrease in the rate of discovery of new sequences in marine environments (Hagström et al. 2002), which might reflect the abundant distribution of a minority of organisms rather than an absolute lack of diversity in the seas (Lunn et al. 2004). However, clone libraries cannot be related to gene frequency in the environment, so this would need to be confirmed with quantitative PCR (Q-PCR) or fluorescent in situ hybridization (FISH) targeting the potential dominant groups.
From the list of potential dominant groups, Bacillaria paxillifer, Phaeodactylum tricornutum and Asterionellopsis glacialis were identified from chloroplast 16S rRNA genes. Indeed, a total of 27.6% of sequences retrieved were derived from chloroplast related sequences, and this could be related to the presence of a phytoplankton bloom around Southern Thule at the time of sampling. Gentile et al. (2006) also found that libraries from two Antarctic marine sites were enriched by picophytoplanktonic 16S sequences of plastid and mitochondrion origins, which they attributed to algal blooms that occurred during sampling.
The presence of common marine species and clones related to common marine species suggests that this dataset could support the idea that there are common marine clusters. Examples of marine clusters include SAR 11 (Field et al. 1997) and Marine Group I archaea (Fuhrman et al. 1992, DeLong 1998, Fuhrman & Davis 1997). Such patterns are also seen in freshwater systems (Zwart et al. 1998, Glöckner et al. 2000, Hahn 2003), and have also been suggested for Antarctic freshwater bacterioplankton (Pearce et al. 2007).
The absolute majority of the clones were related to psychrophilic or psychrotrophic microorganisms, previously detected in permanently cold Arctic and Antarctic marine environments. This appears to be a common observation for Antarctic marine and freshwater systems. Bowman et al. (1997) studied the bacterial populations associated with sea ice sampled from Antarctic coastal areas. They found that most of the sea ice strains examined appeared to be novel taxa, with 45% of the strains being psychrophilic. Elsewhere, Webster & Bourn (2007) investigated the bacterial community structure associated with the Antarctic soft coral, Alcyonium antarcticum. Phylogenetic analysis of isolates and retrieved sequences demonstrated close affiliation with known psychrophiles from the Antarctic environment.
Clone library construction and analysis removes the requirement and inherent bias present in culture studies. Although clone library construction and analysis imposes a bias of its own, using statistical techniques, it is possible to gain some idea of the extent of this bias and the proportion of the community that has been analysed, so that even without knowing the full diversity, it is still possible to compare relative diversity among communities (Hughes et al. 2001). Hughes et al. (2001) also concluded that in some habitats, diversity studies might require sample sizes of only 200–1000 clones to detect richness differences of only tens of species. Massana et al. (2000) screened more than 2000 clones from all over the world and concluded that the diversity of the marine planktonic Archaea is low and is comparable to that detected in previous studies based upon fewer samples. Further, molecular techniques are revealing extensive microbial diversity that was previously undetected. In a single year alone, more than 20 new divisions of bacteria were reported at the phylum or even kingdom level (Fuhrman & Campbell 1998).
The idea that microbial diversity cannot be estimated is derived from the fact that many microbial accumulation curves are linear or close to linear because of high diversity, small sample sizes or both (Hughes et al. 2001). Borneman & Triplett (1997) prepared a clone library of 16S rRNA genes recovered from an Amazonian soil by PCR, which yielded 100 unique sequences. Hughes et al. (2001) found that although estimators depend upon sample size, most of the richness estimates they investigated stabilized within the sample sizes available. Limitations associated with OTU definition are well documented and are described in Hughes et al. (2001) and discussed at length in Gentile et al. (2006).
Staley & Gosink (1999) reviewed sea ice bacteriology as a test case for examining bacterial diversity and biogeography, and it is clear from work with sea ice microbial communities that patterns have begun to emerge. For example, the dominance of gamma-Proteobacteria and Bacteroidetes along with reduced alpha-Proteobacteria is typical (Brown & Bowman 2001). Gentile et al. (2006), working with bacterial communities in Antarctic coastal waters, also found sequences related to gamma-Proteobacteria and Bacteriodetes groups, typical of the Antarctic sea ice bacterial communities. So by analysing these patterns, clone library construction and analysis can be used for biogeographical studies. For example, Abell & Bowman (2005) demonstrated significant differences in the total Flavobacteria spp. community structure and 16S rRNA gene diversity between samples from the polar front / Antarctic zone and those from the temperate and sub-Antarctic zones. Elsewhere, Bano et al. (2004) used clone library construction and analysis using Archaea-specific primers to investigate the phylogenetic composition of Arctic Ocean Archaeal assemblages and compare them to Antarctic assemblages.
A common feature with many Antarctic based clone library studies, and in common with the study reported here, is the high diversity and dominance of particular groups. Bowman et al. (2000) studied the diversity and community structure within anoxic sediment from marine salinity meromictic lakes and a coastal meromictic marine basin in the Vestfold Hills, Eastern Antarctica. They found that little similarity existed between the phylotypes detected in their study and other clone libraries based on marine sediment, suggesting that an enormous prokaryotic diversity occurs within marine and marine-derived sediments. Matsuzaki et al. (2006) used clone library analysis of 16S rRNA gene sequences to study DMSO in the environment, they found that 38 of a total of 48 clones from water of the halocline were identified as Marinobacter. Indeed, these observations are not just confined to work on the bacterioplankton. Gast et al. (2006) studied the biodiversity of protistan assemblages present in microhabitats of the Ross Sea, Antarctica. Sequencing of 18S clone libraries indicated genetically diverse collections of organisms in the water column, ice, and meltwater layer (slush), but a single small subunit ribosomal DNA (srDNA) sequence type dominated clone libraries from seawater and slush samples taken within the ice pack of this ecosystem. Elsewhere, López-García et al. (2001) recovered eukaryotes from the aphotic zone (250–3000 m deep) of the Antarctic polar front. They used ribosomal RNA genes to discover several new groups of bacteria and archaea, some of which, significantly, were abundant.
A small number of clones from the surface marine water around Southern Thule produced relatively consistent libraries with approximately 50% of common sequences between libraries. A total diversity of 183 OTUs was predicted and coverage calculated to be 61.2%. With 117 sequences actually identified (63.9% of the predicted 183), the actual and predicted levels of coverage were in close agreement. Comparing the independent clone libraries, eight matches occurred in each of the seven libraries, whilst 55 occurred in only one - this suggested that there might be a relatively small number of common dominant organisms, with a much larger diversity. 346 clones (roughly four 96 well plates) will yield two-thirds of the total estimated diversity at the BLAST identification level and 50% of the total estimated diversity at the OTU identification level; 438 clones (roughly five 96 well plates) will yield three-quarters of the total estimated diversity at the BLAST identification level and two-thirds of the total estimated diversity at the OTU identification level. Beyond this, a relatively large number of clones were needed to improve coverage significantly, increasing at the rate of about 1 new sequence at the novel BLAST identification level per 100 or 3.5 new sequences per 100 at the OTU identification level. Whilst this approach does provide some preliminary indication of community composition clearly clone library construction and analysis should be used in conjunction with other techniques such as direct culture, cell counts, FISH, DGGE and Q-PCR.
This research was supported by the Natural Environment Research Council through the British Antarctic Survey (LTMS programme). I would like to thank everybody on the RRS James Clark Ross for their help and support.