2020
DOI: 10.3897/mbmg.4.55815
|View full text |Cite
|
Sign up to set email alerts
|

Alignment-free classification of COI DNA barcode data with the Python package Alfie

Abstract: Characterization of biodiversity from environmental DNA samples and bulk metabarcoding data is hampered by off-target sequences that can confound conclusions about a taxonomic group of interest. Existing methods for isolation of target sequences rely on alignment to existing reference barcodes, but this can bias results against novel genetic variants. Effectively parsing targeted DNA barcode data from off-target noise improves the quality of biodiversity estimates and biological conclusions by limiting subsequ… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(22 citation statements)
references
References 45 publications
(49 reference statements)
0
21
0
Order By: Relevance
“…Neural networks have been shown to perform well for taxonomic assignments of microbial sequences based on patterns within the DNA sequence (Busia et al, 2019) and can help test taxonomic assignment abilities at various taxonomic levels. Generally, the flexibility of neural networks and their ability to learn complex patterns from large numbers of short sequences make them a promising choice for solving the general sequence‐labeling problem in eDNA research (Nugent & Adamowicz, 2020).…”
Section: Introductionmentioning
confidence: 99%
“…Neural networks have been shown to perform well for taxonomic assignments of microbial sequences based on patterns within the DNA sequence (Busia et al, 2019) and can help test taxonomic assignment abilities at various taxonomic levels. Generally, the flexibility of neural networks and their ability to learn complex patterns from large numbers of short sequences make them a promising choice for solving the general sequence‐labeling problem in eDNA research (Nugent & Adamowicz, 2020).…”
Section: Introductionmentioning
confidence: 99%
“…Similarly, an alternative align_to_ref() could be written, using the current function as a model, utilising a different alignment algorithm, while keeping it incorporated into the current workflow. Finally, while the barcode_clean() function does currently assess the data, based on a number of metrics, including statistical outliers and amino acid translation, there are additional methods for assessing molecular sequence data and flagging potentially inaccurate data points ( Zhang et al 2012 , Nugent and Adamowicz 2020 , Fontes et al 2021 ). With some of these being built in R, the integration of these additional cleaning and verification methods can be placed into the MACER pipeline.…”
Section: Discussionmentioning
confidence: 99%
“…CAOS [40], BLOG [10], DNABar [41], BRONX [27], DOME-ID [27] are some instances of this approach. There are alignment-free tools such as: ATIM-TNT (Tree-based) [27], CVTreeAlpha1.0 (Component Vector based) [42], Spectrum Kernel [43] and Alfie (python based) [44]. Web Based Tools such as: Linker [45], iBarcode [46], BioBarcode [47] and ConFind [48] are also there for DNA barcoding.…”
Section: Related Workmentioning
confidence: 99%