2019
DOI: 10.1101/791574
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Construction of habitat-specific training sets to achieve species-level assignment in 16S rRNA gene datasets

Abstract: 18 Background: The low cost of 16S rRNA gene sequencing facilitates population-scale 19 molecular epidemiological studies. Existing computational algorithms can parse 16S 20rRNA gene sequences to high-resolution Amplicon Sequence Variants (ASVs), which 21 represent ecologically coherent entities. Assigning species-level taxonomy to these ASVs 22is the critical remaining barrier to drawing ecologically/clinically relevant inferences from 23 and comparing data across 16S rRNA gene-based microbiota studies. 24Res… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(15 citation statements)
references
References 81 publications
0
15
0
Order By: Relevance
“…However, this limited approach does not reflect the complexity of the oral microbiota where around 700 species have been identified (12,13). On the other hand, the identification, at least at the species level, is highly desirable in 16S rRNA sequencingbased studies of the oral microbiota (14). This is because it has been demonstrated how different species from the same genus are associated with different oral conditions (15)(16)(17).…”
Section: Introductionmentioning
confidence: 99%
“…However, this limited approach does not reflect the complexity of the oral microbiota where around 700 species have been identified (12,13). On the other hand, the identification, at least at the species level, is highly desirable in 16S rRNA sequencingbased studies of the oral microbiota (14). This is because it has been demonstrated how different species from the same genus are associated with different oral conditions (15)(16)(17).…”
Section: Introductionmentioning
confidence: 99%
“…Schloss et al (Schloss 2021) found that, with a 97% similarity threshold and applying the OptiClust algorithm, 31.7%, 34.3% and 34.8% of the OTUs assessed had 16S rRNA amplicons from distinct species in the V3-V4, V4, and V4-V5 regions, respectively (Schloss 2021). However, these investigations did not focus on taxa inhabiting a specific environment, despite the importance of conducting 16S rRNA genebased research using habitat-specific databases (Escapa et al 2020). Consequently, we used primer pairs targeting several gene regions (Regueira-Iglesias et al 2021.a) to determine the number of different oral-bacterial and oral-archaeal species with an ASI≥97%, as well as the potential clusters that might contain distinct species.…”
Section: Discussionmentioning
confidence: 99%
“…This would, however, result in both an overabundance of the single species representing the cluster and an underestimation of the diversity of the community, with other species within the OTU overlooked. Consequently, despite possible difficulties, it would be better to use the lowest possible level of resolution, i.e., the variant level (Callahan et al 2017), and databases specifically designed for taxonomic identifications of taxa at this level (Escapa et al 2020).…”
Section: Discussionmentioning
confidence: 99%
“…Sequence reference databases are often subsetted by investigators to focus on particular clades of interest or to perform additional curation of public datasets. Some researchers have generated environment-specific databases, founded in the belief that such databases increase taxonomic classification accuracy by removing sequences that are genetically related but ecologically distinct from species found in a specific environment [52,[55][56][57][58][59], although this can elevate the risk of false-positive errors [76]. RESCRIPt contains several methods to support and evaluate such filtering decisions, which then become embedded in provenance to facilitate transparent and reproducible use of these databases.…”
Section: Reference Curation Improves Taxonomic Classification: Lessonmentioning
confidence: 99%
“…Additionally, issues with amplicon length and sequence heterogeneity can limit the ability to identify species, especially from short marker-gene sequences or metagenome fragments [51]. Hence, many researchers choose to perform additional curation to focus on type strains [52], quality filtering [14,53], or construct environmentspecific databases that are constrained to contain species found within a given environment [52,[54][55][56][57][58][59]. Database customization is also often performed to add new accessions that are absent in some database releases to increase database coverage [50], or to incorporate outgroups [14].…”
Section: Introductionmentioning
confidence: 99%