2021
DOI: 10.1186/s13059-020-02244-4
|View full text |Cite
|
Sign up to set email alerts
|

Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly

Abstract: A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.22 %. SNP calls in Ratatosk corrected reads are nearly 99 % accurate and indel calls accuracy is increased by up to 37 %. An assembly of Ratatosk corrected reads from an Ashkenazi individual yields a contig N50… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
36
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 49 publications
(46 citation statements)
references
References 62 publications
0
36
0
Order By: Relevance
“…Correct-then-assemble approaches like Canu 58 can be practical for smaller genomes 59 , but on gigabase-sized mammalian genomes like in Bovinae we observed >20 Tb of peak temporary storage and >25k CPU hours for correcting only 30-fold ONT reads. Even recent reference-guided correction approaches like Ratatosk 60 still needed approximately 15k CPU hours to correct 55-fold ONT reads. Cutting-edge sequencing and bioinformatic improvements 61 , 62 , like the ONT Guppy5 basecaller, will likely assist more efficient assembly, resulting in higher QV and reduced computational load; however, currently the ONT specific requirements might be computationally prohibitive, especially when assembling many samples.…”
Section: Discussionmentioning
confidence: 99%
“…Correct-then-assemble approaches like Canu 58 can be practical for smaller genomes 59 , but on gigabase-sized mammalian genomes like in Bovinae we observed >20 Tb of peak temporary storage and >25k CPU hours for correcting only 30-fold ONT reads. Even recent reference-guided correction approaches like Ratatosk 60 still needed approximately 15k CPU hours to correct 55-fold ONT reads. Cutting-edge sequencing and bioinformatic improvements 61 , 62 , like the ONT Guppy5 basecaller, will likely assist more efficient assembly, resulting in higher QV and reduced computational load; however, currently the ONT specific requirements might be computationally prohibitive, especially when assembling many samples.…”
Section: Discussionmentioning
confidence: 99%
“…The long reads from MinION (Oxford Nanopore Technologies) were demultiplexed and base-called using Guppy v3.2.2 (high accuracy model) and subsequently were adaptor and quality (Q ≤ 8) trimmed using Porechop v0.2.2 ( https://github.com/rrwick/Porechop ) and BBDuk ( https://sourceforge.net/projects/bbmap/ ) with default settings, respectively. The short reads were used to polish the long reads employing Ratatosk v0.7.0 [ 29 ] with default settings. The corrected long reads were then de novo assembled using Flye [ 30 ] v2.9 resulting in circular chromosomal contigs.…”
Section: Methodsmentioning
confidence: 99%
“…Correct-then-assemble approaches like Canu (Koren et al, 2017) can be practical for smaller genomes (Wick & Holt, 2021), but on gigabase-sized mammalian genomes like in Bovinae we observed >20 Tb of peak temporary storage and >25k CPU hours for correcting only 30-fold ONT reads. Even recent referenceguided correction approaches like Ratatosk (Holley et al, 2021) still needed approximately 15k CPU hours to correct 55-fold ONT reads. Cutting-edge sequencing and bioinformatic improvements (Baid et al, 2021;Silvestre-Ryan & Holmes, 2021), like the ONT Guppy5 basecaller, will likely assist more efficient assembly, resulting in higher QV and reduced computational load; however, currently the ONT specific requirements might be computationally prohibitive, especially when assembling many samples.…”
Section: Discussionmentioning
confidence: 99%
“…Flye (Kolmogorov et al, 2019) (version 2.8.3-b1725) assemblies were constructed with "-genome-size=2.7g --nano-corr" from Ratatosk (version 0.4) (Holley et al, 2021) errorcorrected nanopore reads. The nanopore reads were corrected using a reference-guided approach, taking the haplotype-specific hifiasm assembly as the reference.…”
Section: Genome Assemblymentioning
confidence: 99%