2021
DOI: 10.1038/s41598-020-80757-5
|View full text |Cite
|
Sign up to set email alerts
|

Scalable long read self-correction and assembly polishing with multiple sequence alignment

Abstract: Third-generation sequencing technologies allow to sequence long reads of tens of kbp, that are expected to solve various problems. However, they display high error rates, currently capped around 10%. Self-correction is thus regularly used in long reads analysis projects. We introduce CONSENT, a new self-correction method that relies both on multiple sequence alignment and local de Bruijn graphs. To ensure scalability, multiple sequence alignment computation benefits from a new and efficient segmentation strate… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
46
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 45 publications
(47 citation statements)
references
References 37 publications
1
46
0
Order By: Relevance
“…The correction of errors in sequencing data is fundamental to both the generation of initial data from a sequencer and to downstream analyses which assemble, map, and analyze genomes [28][29][30] . We introduce a transformer-based consensus generation method, reducing errors in PacBio HiFi reads by 42% and increasing yield of 99.9% accurate reads by 27%.…”
Section: Discussionmentioning
confidence: 99%
“…The correction of errors in sequencing data is fundamental to both the generation of initial data from a sequencer and to downstream analyses which assemble, map, and analyze genomes [28][29][30] . We introduce a transformer-based consensus generation method, reducing errors in PacBio HiFi reads by 42% and increasing yield of 99.9% accurate reads by 27%.…”
Section: Discussionmentioning
confidence: 99%
“…CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 3, 2021. ; https://doi.org/10.1101/2021.03.03.433801 doi: bioRxiv preprint 19 Vaser et al 2017;Walker et al 2014;Morisse et al 2021), will lead to new insights in metagenomics especially with complete circular MAGs.…”
Section: Discussionmentioning
confidence: 99%
“…For small genomes or specific genomic regions such as MHCs, we make use of a method for self-correction of long reads in each read group. This method combines a multiple sequence alignment (MSA) strategy with local de Bruijn graphs and is implemented in CONSENT [ 34 ]. When dealing with large genomes (e.g., Chr6), to increase efficiency, we use the MSA-based error correction modules built in MECAT2 [ 35 ] and NECAT [ 36 ] for PacBio CLR and Nanopore reads, respectively.…”
Section: Methodsmentioning
confidence: 99%