2012
DOI: 10.1186/1471-2105-13-s10-s6
|View full text |Cite
|
Sign up to set email alerts
|

Efficient error correction for next-generation sequencing of viral amplicons

Abstract: BackgroundNext-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of am… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
73
0

Year Published

2014
2014
2016
2016

Publication Types

Select...
4
4
1

Relationship

3
6

Authors

Journals

citations
Cited by 95 publications
(75 citation statements)
references
References 27 publications
0
73
0
Order By: Relevance
“…The obtained data sets were processed by the sequential application of the algorithms k-mer error correction (KEC) and a customized version of empirical threshold (ET) (31). Skums et al (31) have previously demonstrated this process to be highly accurate in finding true haplotypes and removing false haplotypes.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The obtained data sets were processed by the sequential application of the algorithms k-mer error correction (KEC) and a customized version of empirical threshold (ET) (31). Skums et al (31) have previously demonstrated this process to be highly accurate in finding true haplotypes and removing false haplotypes.…”
Section: Methodsmentioning
confidence: 99%
“…In stage 1, the set of k-mers (substring of fixed length k) of reads from the processed data set is calculated and the distribution of frequencies of k-mers is analyzed (31). It was previously observed that the frequencies of erroneous and correct k-mers follow different distributions (32)(33)(34).…”
Section: Methodsmentioning
confidence: 99%
“…The resultant data files were sequentially processed through implementation of the k-mer error correction (KEC) and empirical threshold algorithms as previously described, using the parameters k Ï­ 25 and i Ï­ 3 (22,27). A panel of clonal sequences temporally matched to the UDPS data was used to further identify and correct homopolymer errors (22,25).…”
Section: Methodsmentioning
confidence: 99%
“…The obtained reads were further processed with the program PRINSEQ, which depurates pyrosequencing data based on the read quality (Schmieder & Edwards, 2011). After this, the data were filtered by the error correction algorithm implemented in the program KEC, which corrects outlier sequences based on k-mer frequencies and identifies haplotypes based on the corresponding results (Skums et al, 2012). Given that KEC returns a corrected set of sequences, we did not use this output directly.…”
mentioning
confidence: 99%