2018
DOI: 10.1101/383794
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Long-read amplicon denoising

Abstract: † These authors contributed equally Long-read next generation amplicon sequencing shows promise for studying complete genes or genomes from complex and diverse populations. Current long-read sequencing technologies have challenging error profiles, hindering data processing and incorporation into downstream analyses. Here we consider the problem of how to reconstruct, free of sequencing error, the true sequence variants and their associated frequencies. Called "amplicon denoising", this problem has been extensi… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
6
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 32 publications
1
6
0
Order By: Relevance
“…Our results indicate long-read sequencing is remarkably consistent with short-read sequencing for detecting TprK variants, consistent with the increasing accuracy of long-read sequencing [48]. However, as long-read sequencing is still less accurate than short-read sequencing [25,49], we could not confidently call TprK variants present at lower frequencies (< 0.2%). Increasing sequencing depth may help resolve TprK variants present at frequencies less than 0.2%.…”
Section: Discussionsupporting
confidence: 68%
See 2 more Smart Citations
“…Our results indicate long-read sequencing is remarkably consistent with short-read sequencing for detecting TprK variants, consistent with the increasing accuracy of long-read sequencing [48]. However, as long-read sequencing is still less accurate than short-read sequencing [25,49], we could not confidently call TprK variants present at lower frequencies (< 0.2%). Increasing sequencing depth may help resolve TprK variants present at frequencies less than 0.2%.…”
Section: Discussionsupporting
confidence: 68%
“…As long-read sequencing often produces more errors than short-read sequencing, we used a read clustering-based denoising approach [25] to identify probable sequences of the tprK amplicons for each isolate. We filtered Q20 PacBio CCS reads based on size (1,400 - 1,800 bp) prior to applying denoising and then required each cluster to contain a minimum of 5 reads.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We predict that a computational workflow based on the DADA2 method will continue to be effective for PacBio CCS amplicons extending out to ~3 kb but that sensitivity to low-frequency variants will degrade for >3 kb amplicons as the fraction of error-free reads declines. In the regime in which few or no sequences are expected to be error-free, alternative computational methods will be necessary and are now being developed (Kumar 2019). We urge caution in applying our computational methods to data from PacBio RSII sequencing chemistries before P6-C4 and/or CCS data that was generated by the earlier SMRT Portal software, as error rates in such data may be substantially higher than in the data considered here.…”
Section: Discussionmentioning
confidence: 99%
“…We also classified all PSD sequences as intact and defective by sequencing, using criteria described by the Pro-Seq IT tool associated with the PSD database, which include intactness thresholds for sequence length, mutations, and deletions. 13,36,37 We then compared the two classifications (ddPCR and sequencing) to quantify agreement. Of the 1,071 PSD sequences, 966 sequences (90.2%) agreed between the PSD algorithm and our ddPCR protocol (Figure 4A).…”
Section: Sensitivity Limit Of Detection (Lod) and Precisionmentioning
confidence: 99%