2020
DOI: 10.1101/2020.07.15.204925
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Ratatosk – Hybrid error correction of long reads enables accurate variant calling and assembly

Abstract: Motivation: Long Read Sequencing (LRS) technologies are becoming essential to complement Short Read Sequencing (SRS) technologies for routine whole genome sequencing. LRS platforms produce DNA fragment reads, thousands to millions bases long, allowing the resolution of numerous uncertainties left by SRS reads for genome reconstruction and analysis. In particular, LRS characterizes long and complex structural variants undetected by SRS due to short read length. Furthermore, assemblies produced with LRS reads ar… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(15 citation statements)
references
References 59 publications
0
15
0
Order By: Relevance
“…All assemblers were run with default parameters (flagging raw or corrected reads depending on read input, Raven was run with the --weaken flag when corrected reads were used). Additional Flye assemblies were performed using both Canu and NECAT self-corrected read sets and an additional short-read corrected read set corrected with Ratatosk version 0.1 [49], in order to assess read correction strategy performance. The Ratatosk corrected reads were Canu trimmed using the same settings as for the FMLRC corrected read set.…”
Section: Validation Of Assembly and Comparison Of Long Read Assembler Performancementioning
confidence: 99%
“…All assemblers were run with default parameters (flagging raw or corrected reads depending on read input, Raven was run with the --weaken flag when corrected reads were used). Additional Flye assemblies were performed using both Canu and NECAT self-corrected read sets and an additional short-read corrected read set corrected with Ratatosk version 0.1 [49], in order to assess read correction strategy performance. The Ratatosk corrected reads were Canu trimmed using the same settings as for the FMLRC corrected read set.…”
Section: Validation Of Assembly and Comparison Of Long Read Assembler Performancementioning
confidence: 99%
“…Assembling the full dataset with hifiasm would yield an oversized assembly (258 Mb without overlaps). Low-accuracy long reads (ONT or PacBio CLR) were corrected with the Illumina reads using Ratatosk [17] to generate high-accuracy long reads, with default parameters. These corrected long reads were assembled with Flye using the same parameters as for HiFi assemblies.…”
Section: Adineta Vaga Assembliesmentioning
confidence: 99%
“…All assemblers were run with default parameters ( agging raw or corrected reads depending on read input, Raven was run with theweaken ag when corrected reads were used). Additional Flye assemblies were performed using both Canu and NECAT self-corrected read sets and an additional short-read corrected read set corrected with Ratatosk version 0.1 (40), in order to assess read correction strategy performance. The Ratatosk corrected reads were Canu trimmed using the same settings as for the FMLRC corrected read set.…”
Section: Validation Of Assembly and Comparison Of Long Read Assemblermentioning
confidence: 99%
“…Different isolates (variants) of the same species have been found to vary greatly in their phenotypes (16), but due to the relatively small number of isolates sequenced, the extent of genomic variation between strains is poorly understood. Owing to their genomes having multiple chromosomes that contribute to their relatively large genome sizes (30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45) in comparison to bacterial microbes (around 5 Mb), de novo genome assemblies of Metarhizium spp. using rst generation sequencing is very costly, and second-generation sequencing results in assemblies that are highly contiguous, falling apart around repeat rich and homologous regions of the genome.…”
Section: Introductionmentioning
confidence: 99%