2022
DOI: 10.1111/1755-0998.13746
|View full text |Cite
|
Sign up to set email alerts
|

Navigating the seven challenges of taxonomic reference databases in metabarcoding analyses

Abstract: Assessment of biodiversity using metabarcoding data, such as from bulk or environmental DNA sampling, is becoming increasingly relevant in ecology, biodiversity sciences and monitoring. Thereby, the taxonomic identification of species from their DNA sequences relies strongly on reference databases that link genetic sequences to taxonomic names. These databases vary in completeness and availability, depending on the taxonomic group studied and the genetic region targeted. The incompleteness of reference databas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
32
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 41 publications
(50 citation statements)
references
References 133 publications
(123 reference statements)
0
32
0
Order By: Relevance
“…For example, OTU 142_141 is unclassified but plays an important role for the DNA model performance (Figure 4). Several reasons can explain the failure of the classification of these sequences, such as incompleteness or conflicts in the reference database (Keck et al, 2023) but these data remain nonetheless valuable and machine learning algorithms can be used to incorporate them. We here highlight the value of using complete metabarcoding data for the classification and assessment of ecological state, and advocate approaches that are not perpetuating previous limitations inherent to a past method yet obsolete to novel approaches.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…For example, OTU 142_141 is unclassified but plays an important role for the DNA model performance (Figure 4). Several reasons can explain the failure of the classification of these sequences, such as incompleteness or conflicts in the reference database (Keck et al, 2023) but these data remain nonetheless valuable and machine learning algorithms can be used to incorporate them. We here highlight the value of using complete metabarcoding data for the classification and assessment of ecological state, and advocate approaches that are not perpetuating previous limitations inherent to a past method yet obsolete to novel approaches.…”
Section: Discussionmentioning
confidence: 99%
“…Several reasons can explain the failure of the classification of these sequences, such as incompleteness or conflicts in the reference database (Keck et al, 2023) but these data remain nonetheless valuable and machine learning algorithms can be used to incorporate them.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Linking data to a particular taxon is thus highly dependent on the availability of reliable reference sequences, encompassing the whole range of genetic variability. The completeness and accuracy of reference databases is a well‐known challenge for barcoding and metabarcoding studies, as described in Keck et al (2023). Although some methods exist to curate reference databases and remove erroneous sequences, the extant of problematic or missing references is usually difficult to evaluate.…”
Section: Issue #1: the Challenges Of Retrieving A Species‐specific Da...mentioning
confidence: 99%
“…Linking data to a particular taxon is thus highly dependent on the availability of reliable reference sequences, encompassing the whole range of genetic variability. The completeness and accuracy of reference databases is a well-known challenge for barcoding and metabarcoding studies, as described inKeck et al (2023). Although some methods exist to curate reference databases and remove erroneous sequences, the extant of problematic or missing references is usually difficult to evaluate.Unlike the absence of references for a particular species, the absence of particularly divergent genetic variants of a target species F I G U R E 3 Representation of the differences between the genetic information retrieved with either traditional individual-based or eDNA sampling in two fish populations of the same river.…”
mentioning
confidence: 99%