2015
DOI: 10.1101/gr.183012.114
|View full text |Cite
|
Sign up to set email alerts
|

Accurate, multi-kb reads resolve complex populations and detect rare microorganisms

Abstract: Accurate evaluation of microbial communities is essential for understanding global biogeochemical processes and can guide bioremediation and medical treatments. Metagenomics is most commonly used to analyze microbial diversity and metabolic potential, but assemblies of the short reads generated by current sequencing platforms may fail to recover heterogeneous strain populations and rare organisms. Here we used short (150-bp) and long (multi-kb) synthetic reads to evaluate strain heterogeneity and study microor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

5
124
0

Year Published

2015
2015
2018
2018

Publication Types

Select...
3
2
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 121 publications
(129 citation statements)
references
References 37 publications
5
124
0
Order By: Relevance
“…In addition, our ability to track organisms through a series of data sets with stringent read-mapping would be hampered by the universally highly conserved regions in the 16 S rRNA gene, whereas for ribosomal protein-encoding scaffolds, the level of divergence is more consistent across the full span of the sequence used. As for the selection of nucleotide identity to define taxonomic units tracked in this study, Konstantinidis et al (2006) showed that strains could robustly be defined based on 498% ANI, with species thus bounded by 95-98% ANI, whereas our previous work identified 98% ANI as a species boundary (Sharon et al, 2015). Hence, we have clustered organisms at 98% global sequence identity as a conservative definition of species, and as a threshold allowing robust mapping of reads for abundance estimates.…”
Section: Resultsmentioning
confidence: 91%
See 2 more Smart Citations
“…In addition, our ability to track organisms through a series of data sets with stringent read-mapping would be hampered by the universally highly conserved regions in the 16 S rRNA gene, whereas for ribosomal protein-encoding scaffolds, the level of divergence is more consistent across the full span of the sequence used. As for the selection of nucleotide identity to define taxonomic units tracked in this study, Konstantinidis et al (2006) showed that strains could robustly be defined based on 498% ANI, with species thus bounded by 95-98% ANI, whereas our previous work identified 98% ANI as a species boundary (Sharon et al, 2015). Hence, we have clustered organisms at 98% global sequence identity as a conservative definition of species, and as a threshold allowing robust mapping of reads for abundance estimates.…”
Section: Resultsmentioning
confidence: 91%
“…Species predictions were based on a prior analysis of ribosomal protein S3 (rpS3) divergence, which identified 98% and 90% nucleotide identities as thresholds for species and genera, respectively (Sharon et al, 2015). Expanding to genera, only an additional 10 organisms from the two data sets would be considered shared across the communities.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Aligning with RFA may enable determination of the correct haplotype paths of these variable regions. Our approach may also prove useful for subspecies discovery and quantification in metagenomic samples for which previous work had much higher sequencing requirements using TruSeq-assembled synthetic long reads (Sharon et al 2015). There has been previous work to leverage PacBio sequencing to accurately resolve RNA transcripts (Sharon et al 2013).…”
Section: Discussionmentioning
confidence: 99%
“…However, the loss of linkage information during the generation of short reads limits their utility. In particular, short reads are insufficient to phase the haplotypes of individuals within mixtures of similar sequences, including homeologous and homologous chromosomes in polyploids [3,4], viral quasispecies [5], multiply or alternatively spliced mRNA [6], genes from metagenomic samples containing related organisms [7,8], and immune antibody gene repertoires [9]. In these cases, additional information is required to determine whether mutations separated by distances longer than the read length are present in the same individual.…”
Section: Introductionmentioning
confidence: 99%