How many microorganisms are there on earth
The discrepancy also persists even if currently estimated 16S mutation rates r or global cell counts N were off by 10 orders of magnitude or even if global cell counts varied drastically e. One explanation for this discrepancy could be that the evolution of the 16S-V4 region along a lineage is subject to strong constraints that favor some mutations or sequence variants more than others, thus effectively reducing the "permissible" sequence space [ 82 — 84 ]. Alternatively, some processes not captured by the model may eliminate all but just a small fraction of 16S sequence variants emerging over time.
Phylogenetically correlated turnover, i. This would imply that extinction plays a central role in prokaryotic diversification, as recently suggested by [ 15 ] and contrasting common speculations that prokaryotic OTUs are unlikely to go extinct [ 1 , 86 — 88 ]. For example, at coarser phylogenetic resolutions e. Reciprocally, when we analyzed a subset of our data approximately 0.
This suggests that the global richness of exact sequence variants is at most an order of magnitude larger than the number of OTUs. The sequence length considered may also affect global richness measures. When combined with our V4-based richness estimates, this suggests that there exist 2. Unfortunately, while full-length sequencing undoubtedly improves phylogenetic resolution, technical complications and a higher cost currently prevent the wide adoption of full-length 16S sequencing in microbial community surveys.
Finally, we stress that 16S diversity only provides a coarse surrogate for prokaryotic genomic and phenotypic diversity [ 29 , 30 ], and it is probable that the global number of prokaryote ecotypes greatly exceeds the number of OTUs.
Cataloguing the phenotypic and genomic diversity of prokaryotes will undoubtedly be an important but much more challenging future task. In , Curtis and colleagues [ 2 ] hypothesized that experimental approaches to directly enumerating extant prokaryotic diversity will remain fruitless due to logistical challenges.
Our composite data set, covering a multitude of environments worldwide, enabled us to strongly constrain global prokaryotic OTU richness. Indeed, our global richness estimates are similar across a multitude of statistical estimators Fig 1C and 1D , all of which are based on different models of OTU detection probabilities and, in most cases, use a different set of OTU incidence frequency counts.
The high fraction of 16S sequences from other amplicon- and metagenomic-sequencing surveys e. While no particular 16S similarity threshold provides an ideal species analog, OTUs provide an operational and clearly defined measure of richness that can be compared across studies, environments, and geological time [ 15 ]. We reiterate that the goal of the GPC was to enable a more robust estimate of total extant prokaryotic richness than previous studies.
Indeed, our estimates are based on an unprecedentedly large and environmentally broad composite sequencing data set, assembled from hundreds of studies utilizing alternative primers and alternative sampling techniques, and using a wide array of alternative statistical estimation methods for increased robustness. The GPC can thus facilitate future efforts to catalogue and phenotypically describe Earth's extant prokaryotes.
The GPC also opens up new avenues for reconstructing prokaryotic evolution over geological time using massive phylogenetic trees and for refining macroecological theories.
While long considered an unseen majority [ 79 ], thanks to ongoing technological revolutions, prokaryotes could one day become one of the most exhaustively characterized and best understood forms of life. Only Illumina sequences were downloaded to ensure sequence qualities en par with current standards and because Illumina-based studies typically achieve much deeper sequencing than studies using previous-generation e.
We only considered sequences covering the V4 hypervariable region for three reasons. First, use of the same gene region in all samples is necessary for clustering sequences into nonredundant OTUs.
Second, the V4 region is one of the most popular regions targeted in microbial surveys, including the EMP [ 22 ], making it easier to find publicly available data sets and allowing for comparison with the EMP. Third, the V4 region was shown to be the most suitable single hypervariable region for reconstructing bacterial phylogenetic relationships [ 24 ]. Studies were chosen to represent as wide of an environmental spectrum as possible.
A total of 34, samples from studies were downloaded description and accession numbers in S1 Data. Geographical sample locations where available are shown in S1 Fig. We mention that sequencing data from the EMP [ 22 ] were omitted from the GPC because this allowed us to use the EMP as an independent reference data set for assessing the fraction of OTUs rediscovered by the GPC and because the much shorter read lengths in the EMP bp on average compared to the GPC bp on average would reduce the available phylogenetic resolution [ 92 — 96 ].
Paired-end reads with sufficient overlap were merged using flash v1. Of the nonsufficiently overlapping pairs, forward reads were kept and reverse reads discarded. Single-end reads, merged paired-end reads, and nonmerged forward reads were subsequently processed in the same way, as follows.
Reads were trimmed and quality filtered using vsearch v2. Any samples with more than 10 6 quality-filtered reads were subsampled down to 10 6 randomly chosen reads to reduce computational requirements; samples with fewer quality-filtered reads were not subsampled. The 1,,, kept reads were then chimera-filtered de novo using vsearch options— abskew 1. We chose cd-hit-otu because—in contrast to most other OTU-clustering algorithms—it scales relatively well to massive data sets such as ours.
For a comparison between cd-hit-otu and other clustering algorithms, we refer to [ 99 — ]. De novo clustering yielded 1,, clusters. Because primers of the various studies included did not all cover exactly the same regions and due to the clustering algorithm implemented by cd-hit-otu , a small number of clusters was redundant, i. To further avoid spurious i. The taxonomic identity of each OTU was determined based on its similarity to entries in the SILVA database [ 14 ] and by using a consensus approach, as follows.
In either case, the consensus taxonomy of a set of hits was defined as the taxon at the lowest taxonomic possible level e. Any OTUs identified as eukaryotes, chloroplasts, and mitochondria were omitted from subsequent analyses.
To calculate the fraction of prokaryotic 16S diversity recovered by the EMP [ 22 ] that was recaptured by the GPC, we proceeded as follows. EMP sequences were taxonomically identified using the same methods as for the GPC, and any sequences identified as eukaryotes, chloroplasts, or mitochondria were omitted.
An overview of recapture fractions is provided in S2 Table. To calculate the fraction of prokaryotic 16S diversity in the RDP release 11 [ 12 ] that was recaptured by the GPC, we proceeded as follows. Only sequences at least 1, bp long were kept. To calculate the fraction of 16S sequences from metagenome-assembled UBA genomes [ 55 ] that was recaptured by our GPC data set, we proceeded as follows.
Only UBA sequences longer than 1, bp were considered to increase the probability of adequate overlap with the V4 region, leaving us with sequences. UBA sequences were taxonomically identified using the same methods as for the GPC, and any sequences identified as eukaryotes, chloroplasts, or mitochondria were omitted. Aligned sequences were dealigned gaps removed ; taxonomically identified, as described above for the GPC; and any sequences identified as eukaryotes, chloroplasts, or mitochondria were omitted.
Unless otherwise mentioned, sequences in SILVA classified as eukaryotes, mitochondria, or chloroplasts were omitted from all analyses. We identified the first nucleotide position in the GPC alignments that had a gap fraction below 0. To limit computational requirements, we only considered a pseudo-randomly chosen subset of GPC studies studies with paired-end reads and whose names started with the letter "A" through "G" , henceforth referred to as "AG" subset.
This subset was chosen for convenience of file handling, and an alphabetical choice of projects is practically random for our purposes. Any samples with more than 10 6 raw reads were subsampled down to 10 6 randomly chosen reads to reduce computational requirements. This yielded ,, quality-filtered nonmerged paired-end reads. Error rate models were fitted using the DADA2 function learnErrors, separately for each study and separately for forward and reverse reads.
Only ASVs matched by at least two reads across all samples were kept for downstream analyses in order to eliminate spurious sequences. Because we were mainly interested to check if the number of detected ASVs would be substantially i. A summary of AG samples, including sequence accession numbers, is provided as S3 Data. Specifically, quality-filtered nonmerged paired-end reads, produced by the first step in the DADA2 pipeline, were used as input to the GPC clustering pipeline described above.
For a comparison of ASVs and sequence clusters obtained for various numbers of studies included, see S13 Fig. Accumulation curves of OTUs discovered, as a function of studies included, were calculated as follows. For any given number of studies N , we randomly chose N studies in the GPC and counted the number of OTUs detected in at least one of the chosen studies.
We repeated this step independent times and averaged the number of OTUs counted each time. By performing this process for various N from 1 to , we obtained the accumulation curves shown in Fig 1A and 1B. To estimate the total number of OTUs globally using the statistical estimators described in the main text iChao2, ICE, CatchAll, breakaway, tWLRM , we considered each study as an independent sampling unit and counted the number of OTUs found in exactly one sampling unit Q 1 , in exactly two sampling units Q 2 , and so on.
Note that since our last quality filter, by which we only kept OTUs found in at least two samples of the same study, was applied separately for each study, every study can indeed be considered as an independent sampling unit. The assumption of the above estimators that sampling units are equivalent e.
To check whether our estimates are affected by this caveat, we also used a variant of iChao2 "iChao2split" , whereby we randomly assigned studies to four complementary and equally sized groups and considered each group as a single independent global sampling unit.
Hence, iChao2split considered the number of OTUs found in only one study group Q 1 , in exactly two study groups Q 2 , in three study groups Q 3 , and in all four study groups Q 4.
The splitting was randomly repeated times, and the obtained estimates were averaged Fig 1E ; the standard error was set to the standard deviation of estimates across repeated splittings. We mention that analogous estimators exist e. Such abundance-based estimators are not suited for our data set for two reasons: first, to obtain a single globally ranging reference sample, we would need to pool all GPC samples so as to obtain a measure of abundance for the various OTUs.
However, read counts from separate amplicon-sequencing samples cannot be combined to obtain a measure of global OTU abundances since the total number of cells that was present in each sample is unknown and sequencing depths varied between samples. Second, typical abundance-based estimators such as iChao1 rely on knowing the number of singleton OTUs i.
Note that this filter corresponds to increasing the OTU detection threshold in each study, just as sequencing depth affects detection thresholds. Since the incidence-based richness estimators used in this study all account for finite a priori unknown and potentially variable detection probabilities, their applicability is not expected to be substantially compromised by a systematic application of this filter.
This is roughly analogous to performing a mark-recapture—based assessment of wildlife population size; a systematic decrease of capturing effort may increase the variance of the resulting estimate, but it will not affect the expected value of that estimate. Using the fact that the total estimated probability of hitting an OTU with zero reads in the GPC P 0 is not greater than P 1 it is more probable to rehit some OTU with one read than to hit some OTU with zero reads and the fact that , we obtain the lower bound.
An overview of computed probabilities for various clustering thresholds is given in S7 Table. We note that the Good—Turing frequency estimator is widely used in biological statistics and has been repeatedly shown to be more robust than simply using the fraction of assigned reads [ , ]. We emphasize that we calculated MRAs separately for each sample, even though MRAs from shallower sequenced samples may be less accurate.
This approach was preferred over the alternative of simply calculating the fraction of reads assigned to an OTU when all samples are pooled because samples differ drastically in sequencing depth; thus, OTUs that happen to occur in deeply sequenced samples would appear to be more abundant than OTUs in shallowly sequenced samples.
Similarly, pooling within studies was also avoided because sequencing depth varied widely even among samples of the same study, and samples were usually not technical replicates; hence, MRAs calculated for a given study after pooling would be biased toward organisms that happened to be present in deeply sequenced samples. By calculating MRAs separately for each sample prior to averaging, we avoid biases toward OTUs in more deeply sequenced samples.
We note that the resulting frequency histogram should not be interpreted as a true OTU abundance distribution because it only includes OTUs discovered by the GPC and may thus be artificially positively skewed [ ]. We randomly removed half of the quality- and chimera-filtered reads and repeated the OTU clustering and analyses described above, thus obtaining a rarefied variant of the GPC rGPC. Specifically, we assumed that the number of reads assigned to an OTU in any given MRA interval was Poisson-distributed and that the probability of being discovered was given by the probability of being matched by at least two reads, i.
Fitting was performed via least-squares. The fitted log-normal model was integrated over the entire real axis to obtain an estimate for the total number of extant prokaryotic OTUs.
Only samples with publicly accessioned latitude and longitude information are shown 25, samples in A; 4, samples in B. Frequency histograms of the number of samples top row and the number of studies bottom row in which each GPC OTU was found in for bacteria left column and archaea right column. In A and B, the left-most bar refers to a number of samples equal to two.
Error bars indicate standard errors, estimated from the underlying models; most standard errors are likely underestimated by the models, so the variability between models is probably a more honest assessment of uncertainty. Only phyla including at least 10 entries in SILVA release , set NR99 and estimated to contain at least 10 extant clusters are shown.
Only classes including at least 10 entries in SILVA release , set NR99 and estimated to contain at least 10 extant clusters are shown.
Only a subset of studies were used in this analysis subset "AG". Figures A and B contain the same information, shown in alternative ways. Note that the horizontal axis shows similarities in A and distances in B. Also see S15 Fig for a comparison with exact amplicon sequence variants. Clusters were generated from a subset of studies subset "AG". The last row lists the number of clusters discovered by the GPC.
NA indicates that the estimator did not converge. Number of 16S sequence clusters in the GPC with exactly two reads N 2 and probability that a single additional amplicon sequence would hit a GPC cluster P , estimated using the Good—Turing frequency formula, see Methods for details for various clustering similarities. Abstract The global diversity of Bacteria and Archaea, the most ancient and most widespread forms of life on Earth, is a subject of intense controversy.
Author summary The global diversity of Bacteria and Archaea "prokaryotes" , the most ancient and most widespread forms of life on Earth, is subject to high uncertainty.
Introduction Microorganisms are the most ancient and the most widespread form of life on Earth, inhabiting virtually every ecosystem and driving the bulk of global biogeochemical cycles. Results and discussion The GPC covers the bulk of global 16S diversity To ensure maximal phylogenetic coverage, the raw sequencing data from each study was considered as input to our analyses.
Download: PPT. Eliminating potential caveats While our statistical richness estimators Fig 1C and 1D were designed to account for variable detection probabilities among OTUs, the potential risk of neglecting a large number of extremely rare OTUs cannot be overemphasized.
Fig 2. Most prokaryotic OTUs are globally distributed When we repeated our analyses using only studies from the Americas or near American coasts studies across 14 countries, see map in S1 Fig instead of the full GPC, OTU discovery rates for any given number of studies remained almost unchanged Fig 1A and 1B.
Taxon-specific diversities and coverages in databases Our census allows an unprecedentedly precise assessment of the diversity covered by existing 16S databases such as SILVA [ 14 ] or the RDP [ 12 ]. Implications Our work suggests that global prokaryotic OTU richness is about six orders of magnitude lower than previously predicted via extrapolation of diversity scaling laws and OTU abundance distributions fitted to individual microbial communities [ 6 , 8 ].
Conclusions In , Curtis and colleagues [ 2 ] hypothesized that experimental approaches to directly enumerating extant prokaryotic diversity will remain fruitless due to logistical challenges.
Amplicon sequence clustering Paired-end reads with sufficient overlap were merged using flash v1. Accumulation curves Accumulation curves of OTUs discovered, as a function of studies included, were calculated as follows.
Estimating global OTU richness based on incidence frequencies To estimate the total number of OTUs globally using the statistical estimators described in the main text iChao2, ICE, CatchAll, breakaway, tWLRM , we considered each study as an independent sampling unit and counted the number of OTUs found in exactly one sampling unit Q 1 , in exactly two sampling units Q 2 , and so on.
Supporting information. S1 Data. Sample summary and accession numbers. S2 Data. OTU incidence frequency tables. OTU, operational taxonomic unit. S3 Data. Sample summary and accession numbers for AG subset. S1 Text. The pitfalls of extrapolating host-specific microbial diversity estimates. S2 Text. An upper bound for the number of extant OTUs at steady state. S1 Fig. Sample locations. S2 Fig. Samples and studies per OTU. S3 Fig. S4 Fig. S5 Fig. Prokaryotic richness estimates Americas versus globally.
S6 Fig. S7 Fig. S8 Fig. S9 Fig. S10 Fig. S11 Fig. S12 Fig. S13 Fig. S14 Fig. Because they are so widely distributed and microscopic, counting all the bacteria on the face of the earth is an impossible task. Estimating these numbers, however, is feasible. Bacteria can be found living in nearly every habitat on the face of the earth, regardless of how seemingly inhospitable.
Millions of bacteria fill the guts of humans and other animals, as well as cover the surface of plant roots. Bacteria have been found in the deepest parts of the ocean, seven miles under the surface and as high as 40 miles into the atmosphere.
Many species of bacteria can withstand harsh conditions, including extreme heat, cold and saline. In , William Whitman and his team at the University of Georgia estimated the number of bacteria living on the earth by examining different habitat types and estimating those numbers separately.
Habitat types included organisms, water freshwater and oceans and soils. These habitats were broken down into smaller categories when necessary like forest soils versus non-forest soils and often direct bacterial counts were made. When direct counts were not possible, estimates were made based on published literature. The number of bacteria on earth is estimated to be 5,,,,,,,,,, The domain was proposed by the microbiologist and physicist Carl Woese in and is based on identifying similarities in ribosomal RNA sequences of microorganisms.
The second largest group is called a kingdom. Five major kingdoms have been described and include prokaryota e. A kingdom is further split into phylum or division, class, order, family, genus, and species, which is the smallest group.
The science of classifying organisms is called taxonomy and the groups making up the classification hierarchy are called taxa. Taxonomy consists of classifying new organisms or reclassifying existing ones. Microorganisms are scientifically recognized using a binomial nomenclature using two words that refer to the genus and the species. The names assigned to microorganisms are in Latin. The first letter of the genus name is always capitalized.
Classification of microorganisms has been largely aided by studies of fossils and recently by DNA sequencing. Methods of classifications are constantly changing.
The most widely employed methods for classifying microbes are morphological characteristics, differential staining, biochemical testing, DNA fingerprinting or DNA base composition, polymerase chain reaction, and DNA chips.
Assess the characteristics of pre-life earth and which adaptations allowed early microbial life to flourish. Scientific evidence suggests that life began on Earth some 3. Since then, life has evolved into a wide variety of forms, which biologists have classified into a hierarchy of taxa. Some of the oldest cells on Earth are single-cell organisms called archaea and bacteria. Fossil records indicate that mounds of bacteria once covered young Earth.
Some began making their own food using carbon dioxide in the atmosphere and energy they harvested from the sun.
Soon afterward, new oxygen-breathing life forms came onto the scene. With a population of increasingly diverse bacterial life, the stage was set for more life to form. There is compelling evidence that mitochondria and chloroplasts were once primitive bacterial cells. This evidence is described in the endosymbiotic theory.
Symbiosis occurs when two different species benefit from living and working together. The endosymbiotic theory describes how a large host cell and ingested bacteria could easily become dependent on one another for survival, resulting in a permanent relationship.
Over millions of years of evolution, mitochondria and chloroplasts have become more specialized and today they cannot live outside the cell. Mitochondria and chloroplasts have striking similarities to bacteria cells. And both organelles use their DNA to produce many proteins and enzymes required for their function. A double membrane surrounding both mitochondria and chloroplasts is further evidence that each was ingested by a primitive host.
The two organelles also reproduce like bacteria, replicating their own DNA and directing their own division. It is passed down directly from mother to child, and it accumulates changes much more slowly than other types of DNA. Because of its unique characteristics, mtDNA has provided important clues about evolutionary history. For example, differences in mtDNA are examined to estimate how closely related one species is to another. Conditions on Earth 4 billion years ago were very different than they are today.
The atmosphere lacked oxygen, and an ozone layer did not yet protect Earth from harmful radiation. Heavy rains, lightning, and volcanic activity were common. Yet the earliest cells originated in this extreme environment. Extremophiles archaea still thrive in extreme habitats. Astrobiologists are now using archaea to study the origins of life on Earth and other planets.
Because archaea inhabit places previously considered incompatible with life, they may provide clues that will improve our ability to detect extraterrestrial life.
Interestingly, current research suggests archaea may be capable of space travel by meteorite. Such an event termed panspermia could have seeded life on Earth or elsewhere. The presence of archaea and bacteria changed Earth dramatically. They helped establish a stable atmosphere and produced oxygen in such quantities that eventually life forms could evolve that needed oxygen. The new atmospheric conditions calmed the weather so that the extremes were less severe.
Life had created the conditions for new life to be formed. This process is one of the great wonders of nature. Microbes are ubiquitous on Earth and their diversity and abundance are determined by the biogeographical habitat they occupy.
Summarize how microbial diversity contributes to microbial occupation of diverse geographical niches. The microbial world encompasses most of the phylogenetic diversity on Earth, as all Bacteria, all Archaea, and most lineages of the Eukarya are microorganisms.
0コメント