2022
DOI: 10.1093/bioinformatics/btac395
|View full text |Cite
|
Sign up to set email alerts
|

Shepherd: accurate clustering for correcting DNA barcode errors

Abstract: Motivation DNA barcodes are short, random nucleotide sequences introduced into cell populations to track the relative counts of hundreds of thousands of individual lineages over time. Lineage tracking is widely applied, e.g. to understand evolutionary dynamics in microbial populations and the progression of breast cancer in humans. Barcode sequences are unknown upon insertion and must be identified using next-generation sequencing technology, which is error prone. In this study, we frame the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 17 publications
(28 reference statements)
0
4
0
Order By: Relevance
“…Moreover, some errors may be sequence-specific (see “ Structure of the Barcode Locus ” section), such that the naive approach may produce biased lineage frequency estimates. Fortunately, a number of error-correction techniques are available (e.g., Li and Godzik 2006 ; Edgar 2010 , 2016 ; Ghodsi et al 2011 ; James et al 2018 ; Wei et al 2021 ; Dasari and Bhukya 2022 ; Millán Arias et al 2022 ), some of which were developed specifically for barcode data (e.g., Zorita et al 2015 ; Zhao et al 2018 ; Tavakolian et al 2022 ).…”
Section: Identifying Barcodes In Sequencing Datamentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, some errors may be sequence-specific (see “ Structure of the Barcode Locus ” section), such that the naive approach may produce biased lineage frequency estimates. Fortunately, a number of error-correction techniques are available (e.g., Li and Godzik 2006 ; Edgar 2010 , 2016 ; Ghodsi et al 2011 ; James et al 2018 ; Wei et al 2021 ; Dasari and Bhukya 2022 ; Millán Arias et al 2022 ), some of which were developed specifically for barcode data (e.g., Zorita et al 2015 ; Zhao et al 2018 ; Tavakolian et al 2022 ).…”
Section: Identifying Barcodes In Sequencing Datamentioning
confidence: 99%
“…We selected six error-correction software, two developed for generic sequence data, DNAClust (Ghodsi et al 2011 ) and CD-Hit (Li and Godzik 2006 ), and four developed specifically for barcode data, Bartender (Zhao et al 2018 ), Starcode (Zorita et al 2015 ), Shepherd (Tavakolian et al 2022 ), and “Deletion-Correct,” a modified version of the algorithm used in (Johnson et al 2019 ). We tested their accuracy by performing error correction on a dataset of simulated barcode reads with realistic errors (Methods, “ Comparison of Error Correction Methods ” section).…”
Section: Identifying Barcodes In Sequencing Datamentioning
confidence: 99%
“…Moreover, some errors may be sequence-specific (see Section 2.1), such that the naive approach may produce biased lineage frequency estimates. Fortunately, a number of error-correction techniques are available (e.g., (Li & Godzik, 2006;Edgar, 2010Edgar, , 2016Ghodsi et al, 2011;James et al, 2018;Wei et al, 2021;Dasari & Bhukya, 2022;Millán Arias et al, 2022)), some of which were developed specifically for barcode data (e.g., (Zorita et al, 2015;Zhao et al, 2018;Tavakolian et al, 2022)).…”
Section: Error Correctionmentioning
confidence: 99%
“…We selected six error-correction software, two developed for generic sequence data, DNAClust (Ghodsi et al, 2011) and CD-Hit (Li & Godzik, 2006), and four developed specifically for barcode data, Bartender (Zhao et al, 2018), Starcode (Zorita et al, 2015), Shepherd (Tavakolian et al, 2022) and "Deletion-Correct", a modified version of the algorithm used in Johnson et al (2019). We first tested their accuracy by performing error correction on a dataset of simulated barcode reads with realistic errors (Methods).…”
Section: Error Correctionmentioning
confidence: 99%