2022
DOI: 10.1111/1755-0998.13709
|View full text |Cite
|
Sign up to set email alerts
|

Optimal sequence similarity thresholds for clustering of molecular operational taxonomic units inDNAmetabarcoding studies

Abstract: Clustering approaches are pivotal to handle the many sequence variants obtained in DNA metabarcoding data sets, and therefore they have become a key step of metabarcoding analysis pipelines. Clustering often relies on a sequence similarity threshold to gather sequences into molecular operational taxonomic units (MOTUs), each of which ideally represents a homogeneous taxonomic entity (e.g., a species or a genus).However, the choice of the clustering threshold is rarely justified, and its impact on MOTU over-spl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 17 publications
(16 citation statements)
references
References 63 publications
1
15
0
Order By: Relevance
“…Owing to this algorithm, QCauto does not require a minimum similarity threshold to distinguish taxa (Tanabe & Toju, 2013). As a result, QCauto may be robust against splitting and lumping errors, which are unavoidable in ordinal similarity‐based clustering or discrimination methods with a single threshold value (Bonin et al, 2022). Furthermore, among the various assignment algorithms, QCauto returns the most reliable results when the completeness of the reference sequence database of all potentially observable species is low (Tanabe & Toju, 2013), as expected in the open ocean.…”
Section: Methodsmentioning
confidence: 99%
“…Owing to this algorithm, QCauto does not require a minimum similarity threshold to distinguish taxa (Tanabe & Toju, 2013). As a result, QCauto may be robust against splitting and lumping errors, which are unavoidable in ordinal similarity‐based clustering or discrimination methods with a single threshold value (Bonin et al, 2022). Furthermore, among the various assignment algorithms, QCauto returns the most reliable results when the completeness of the reference sequence database of all potentially observable species is low (Tanabe & Toju, 2013), as expected in the open ocean.…”
Section: Methodsmentioning
confidence: 99%
“…Fifth, we clustered sequences at a threshold of 96% (Bact02, Euka02, Inse01), 95% (Fung02), 92% (Olig01) or 85% (Coll01) sequence similarity using the sumaclust program (https://git.metabarcoding.org/obitools/sumaclust/wikis/home). These thresholds were selected for each taxon on the basis of the distributions of pairwise sequence similarities within and between species of the same genus for each marker (Bonin et al, 2022). The selected thresholds allow us to minimize the risk that multiple individuals belonging to the same species are assigned to different molecular operational taxonomic units (MOTUs), while limiting the probability that distinct species are collapsed in one single MOTU (Bonin et al, 2022).…”
Section: Methodsmentioning
confidence: 99%
“…These thresholds were selected for each taxon on the basis of the distributions of pairwise sequence similarities within and between species of the same genus for each marker (Bonin et al, 2022). The selected thresholds allow us to minimize the risk that multiple individuals belonging to the same species are assigned to different molecular operational taxonomic units (MOTUs), while limiting the probability that distinct species are collapsed in one single MOTU (Bonin et al, 2022). Differences in thresholds are related to differences in taxonomic resolution across markers.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…For instance, if freshwater biodiversity is analysed using primers amplifying bacteria, diatoms, molluscs, insects, fishes and amphibians, key taxa such as crustaceans and most microeukaryotes will remain undetected. Furthermore, integrating the results of multiple markers to obtain a coherent, homogeneous species lists can be challenging (Bonin et al, 2023; Jurburg et al, 2021; see Section 3.3).…”
Section: Potential Strategies For Exaustive Biodiversity Analyses Usi...mentioning
confidence: 99%