BackgroundIn many environmental genomics applications a homologous region of DNA from a diverse sample is first amplified by PCR and then sequenced. The next generation sequencing technology, 454 pyrosequencing, has allowed much larger read numbers from PCR amplicons than ever before. This has revolutionised the study of microbial diversity as it is now possible to sequence a substantial fraction of the 16S rRNA genes in a community. However, there is a growing realisation that because of the large read numbers and the lack of consensus sequences it is vital to distinguish noise from true sequence diversity in this data. Otherwise this leads to inflated estimates of the number of types or operational taxonomic units (OTUs) present. Three sources of error are important: sequencing error, PCR single base substitutions and PCR chimeras. We present AmpliconNoise, a development of the PyroNoise algorithm that is capable of separately removing 454 sequencing errors and PCR single base errors. We also introduce a novel chimera removal program, Perseus, that exploits the sequence abundances associated with pyrosequencing data. We use data sets where samples of known diversity have been amplified and sequenced to quantify the effect of each of the sources of error on OTU inflation and to validate these algorithms.ResultsAmpliconNoise outperforms alternative algorithms substantially reducing per base error rates for both the GS FLX and latest Titanium protocol. All three sources of error lead to inflation of diversity estimates. In particular, chimera formation has a hitherto unrealised importance which varies according to amplification protocol. We show that AmpliconNoise allows accurate estimates of OTU number. Just as importantly AmpliconNoise generates the right OTUs even at low sequence differences. We demonstrate that Perseus has very high sensitivity, able to find 99% of chimeras, which is critical when these are present at high frequencies.ConclusionsAmpliconNoise followed by Perseus is a very effective pipeline for the removal of noise. In addition the principles behind the algorithms, the inference of true sequences using Expectation-Maximization (EM), and the treatment of chimera detection as a classification or 'supervised learning' problem, will be equally applicable to new sequencing technologies as they appear.
We present an algorithm, PyroNoise, that clusters the flowgrams of 454 pyrosequencing reads using a distance measure that models sequencing noise. This infers the true sequences in a collection of amplicons. We pyrosequenced a known mixture of microbial 16S rDNA sequences extracted from a lake and found that without noise reduction the number of operational taxonomic units is overestimated but using PyroNoise it can be accurately calculated.
Atlantic cod (Gadus morhua) is a large, cold-adapted teleost that sustains long-standing commercial fisheries and incipient aquaculture1,2. Here we present the genome sequence of Atlantic cod, showing evidence for complex thermal adaptations in its haemoglobin gene cluster and an unusual immune architecture compared to other sequenced vertebrates. The genome assembly was obtained exclusively by 454 sequencing of shotgun and paired-end libraries, and automated annotation identified 22,154 genes. The major histocompatibility complex (MHC) II is a conserved feature of the adaptive immune system of jawed vertebrates3,4, but we show that Atlantic cod has lost the genes for MHCII, CD4 and Ii that are essential for the function of this pathway. Nevertheless, Atlantic cod is not exceptionally susceptible to disease under natural conditions5. We find a highly expanded number of MHCI genes and a unique composition of its Toll-like receptor (TLR) families. This suggests how the Atlantic cod immune system has evolved compensatory mechanisms within both adaptive and innate immunity in the absence of MHCII. These observations affect fundamental assumptions about the evolution of the adaptive immune system and its components in vertebrates.
BackgroundSoil ecosystems harbor the most complex prokaryotic and eukaryotic microbial communities on Earth. Experimental approaches studying these systems usually focus on either the soil community's taxonomic structure or its functional characteristics. Many methods target DNA as marker molecule and use PCR for amplification.Methodology/Principal FindingsHere we apply an RNA-centered meta-transcriptomic approach to simultaneously obtain information on both structure and function of a soil community. Total community RNA is random reversely transcribed into cDNA without any PCR or cloning step. Direct pyrosequencing produces large numbers of cDNA rRNA-tags; these are taxonomically profiled in a binning approach using the MEGAN software and two specifically compiled rRNA reference databases containing small and large subunit rRNA sequences. The pyrosequencing also produces mRNA-tags; these provide a sequence-based transcriptome of the community. One soil dataset of 258,411 RNA-tags of ∼98 bp length contained 193,219 rRNA-tags with valid taxonomic information, together with 21,133 mRNA-tags. Quantitative information about the relative abundance of organisms from all three domains of life and from different trophic levels was obtained in a single experiment. Less frequent taxa, such as soil Crenarchaeota, were well represented in the data set. These were identified by more than 2,000 rRNA-tags; furthermore, their activity in situ was revealed through the presence of mRNA-tags specific for enzymes involved in ammonia oxidation and CO2 fixation.Conclusions/SignificanceThis approach could be widely applied in microbial ecology by efficiently linking community structure and function in a single experiment while avoiding biases inherent in other methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.