2016
DOI: 10.1101/071282
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Canu: scalable and accurate long-read assembly via adaptivek-mer weighting and repeat separation

Abstract: Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves dept… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

5
1,404
0
4

Year Published

2016
2016
2020
2020

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 1,150 publications
(1,428 citation statements)
references
References 76 publications
5
1,404
0
4
Order By: Relevance
“…Two single-molecule real-time (SMRT) cells were used for an output of 542,585,804 bases, a mean read length of 8,141, and 86× reference coverage. DNA sequencing data sets were analyzed using a combination of de novo assembly [short reads, SOAP denovo (10); long reads, Canu (11)] and nucleotide variant identification methods [short reads, Stampy, SAMtools, and VCFtools (1214); long reads, Pilon (15); and MUMmer (16)]. This allowed both an update of the genome nucleotide sequence and the identification of genomic regions that had been misassembled, or missed entirely, in the original sequencing project.…”
Section: Genome Announcementmentioning
confidence: 99%
“…Two single-molecule real-time (SMRT) cells were used for an output of 542,585,804 bases, a mean read length of 8,141, and 86× reference coverage. DNA sequencing data sets were analyzed using a combination of de novo assembly [short reads, SOAP denovo (10); long reads, Canu (11)] and nucleotide variant identification methods [short reads, Stampy, SAMtools, and VCFtools (1214); long reads, Pilon (15); and MUMmer (16)]. This allowed both an update of the genome nucleotide sequence and the identification of genomic regions that had been misassembled, or missed entirely, in the original sequencing project.…”
Section: Genome Announcementmentioning
confidence: 99%
“…In total, 190 euchromatic gaps were targeted for gap closure with AK1 assembly. The gaps that could not be closed or extended with the AK1 assembly were subjected to closure through local assembly using Canu 27 or a contiguous subread. Subreads mapped 10 kb upstream or downstream of the gap were chosen for local assembly.…”
mentioning
confidence: 99%
“…De Bruijn graphs, and k-mer-based processing in general, have also proven useful, even for long read sequence analysis (Carvalho et al , 2016; Koren et al , 2017; Salmela et al , 2016). …”
Section: Introduction and Related Workmentioning
confidence: 99%