2018
DOI: 10.1038/s41598-018-29325-6
|View full text |Cite
|
Sign up to set email alerts
|

Systematic evaluation of error rates and causes in short samples in next-generation sequencing

Abstract: Next-generation sequencing (NGS) is the method of choice when large numbers of sequences have to be obtained. While the technique is widely applied, varying error rates have been observed. We analysed millions of reads obtained after sequencing of one single sequence on an Illumina sequencer. According to our analysis, the index-PCR for sample preparation has no effect on the observed error rate, even though PCR is traditionally seen as one of the major contributors to enhanced error rates in NGS. In addition,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

8
201
3

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 274 publications
(225 citation statements)
references
References 35 publications
8
201
3
Order By: Relevance
“…The resolution and accuracy achieved by our methodology derives in significant part from the exceptional and not-entirely-appreciated accuracy of modern PacBio CCS sequencing. We observed total error rates of 4.3 x 10 -4 per nucleotide in PacBio CCS amplicon sequencing reads, significantly lower than the per-base error rates of common Illumina sequencing platforms (Schirmer 2016, Pfeiffer 2018. As a result, half of all sequencing reads were error-free over the entire ~1.5 kilobase (kb) 16S rRNA gene and a computational approach leveraging repeated observations of error-free sequence was adaptable to PacBio CCS data (Callahan 2016;Callahan 2017).…”
Section: Discussionmentioning
confidence: 90%
“…The resolution and accuracy achieved by our methodology derives in significant part from the exceptional and not-entirely-appreciated accuracy of modern PacBio CCS sequencing. We observed total error rates of 4.3 x 10 -4 per nucleotide in PacBio CCS amplicon sequencing reads, significantly lower than the per-base error rates of common Illumina sequencing platforms (Schirmer 2016, Pfeiffer 2018. As a result, half of all sequencing reads were error-free over the entire ~1.5 kilobase (kb) 16S rRNA gene and a computational approach leveraging repeated observations of error-free sequence was adaptable to PacBio CCS data (Callahan 2016;Callahan 2017).…”
Section: Discussionmentioning
confidence: 90%
“…Namely, the presence of intracelular parasites ( Wolbachia , Xiao et al, ), copies of nuclear mitochondrial DNA sequences (NUMTS, Hazkani‐Covo, Zeller, & Martin, ), gene introgression in hybrid species (Bachtrog, Hornton, Lark, & Andolfatto, ), and the incomplete lineage sorting (Pollard, Iyer, Moses, & Eisen, ) are among the factors that might have affected our results and increased the intraspecific distance obtained in the taxa above mentioned. In addition to that, a very small percentage of the sequencing reads (~0.1%, Taberlet et al, ) might have been assigned to the wrong sample index during the sequencing process, although a recent study suggests that this is not the main cause of errors in Illumina platforms (Pfeiffer et al, ). The same study also reports that the sequencing reads quality control, such as the one employed in this study is capable of correcting such errors.…”
Section: Discussionmentioning
confidence: 99%
“…We simulated that these allegedly correct amplicons were sequenced with error rates between 0.001 and 0.01 per base, bracketing values published for HTS sequencers and, in particular, for the MiSeq platform (Schirmer et al 2016, Pfeiffer et al 2018. For simplicity, we assumed a constant error rate for all bases in a sequence, albeit we acknowledge that this is a simplification as sequence features such as homopolymer regions make some positions more prone to errors (Taberlet et al 2018).…”
Section: Simulation Analysismentioning
confidence: 99%