2020
DOI: 10.1101/2020.03.06.977975
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Long-read error correction: a survey and qualitative comparison

Abstract: Third generation sequencing technologies Pacific Biosciences and Oxford Nanopore Technologies were respectively made available in 2011 and 2014. In contrast with second generation sequencing technologies such as Illumina, these new technologies allow the sequencing of long reads of tens to hundreds of kbps. These so called long reads are particularly promising, and are especially expected to solve various problems such as contig and haplotype assembly or scaffolding, for instance. However, these reads are also… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
13
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 17 publications
(13 citation statements)
references
References 95 publications
0
13
0
Order By: Relevance
“…Our analysis proceeds on the basis that neither short read nor long read data can be assumed to provide an accurate reference genome, and so we seek to understand and characterize the degree of agreement between assembled sequence generated from each data source. Although error prone MinION sequence can be corrected using higher quality short read sequences 77 , we have deliberately kept the two sources of data separate so as to not introduce any positive bias in the calculation of the concordance statistics. The concordance statistic was developed to provide a straightforward screening procedure for identifying short read MAGs that are cognate to assembled genomes from long read data, by capturing information from alignment statistics.…”
Section: Discussionmentioning
confidence: 99%
“…Our analysis proceeds on the basis that neither short read nor long read data can be assumed to provide an accurate reference genome, and so we seek to understand and characterize the degree of agreement between assembled sequence generated from each data source. Although error prone MinION sequence can be corrected using higher quality short read sequences 77 , we have deliberately kept the two sources of data separate so as to not introduce any positive bias in the calculation of the concordance statistics. The concordance statistic was developed to provide a straightforward screening procedure for identifying short read MAGs that are cognate to assembled genomes from long read data, by capturing information from alignment statistics.…”
Section: Discussionmentioning
confidence: 99%
“…Although these approaches were accurately detected breakpoints in most cases, we feel this part could be improved even more. One possibility for improvement would be to use other error correction techniques such as those using auxiliary short-read alignment 52 .…”
Section: Discussionmentioning
confidence: 99%
“…However, a recurrent issue with most error correction methods is that they do not retain the phasing of the reads, hence limiting the usage of corrected data to mixed-haplotype assembly. We provide here a short overview of hybrid correction methods and refer to genomic [19,21,22] and transcriptomic [23] LRS reads correction reviews for more details about self-correction methods.…”
Section: Previous Workmentioning
confidence: 99%
“…CoLoRMap takes advantage of the paired-end information to leap over regions of LRS reads where no SRS reads map. We refer to LRS reads correction reviews [19,21,22] for further information.…”
Section: Previous Workmentioning
confidence: 99%
See 1 more Smart Citation