2020
DOI: 10.1101/2020.08.12.247734
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Significantly improving the quality of genome assemblies through curation

Abstract: BackgroundGenome sequence assemblies provide the basis for our understanding of biology. Generating error-free assemblies is therefore the ultimate, but sadly still unachieved goal of a multitude of research projects. Despite the ever-advancing improvements in data generation, assembly algorithms and pipelines, no automated approach has so far reliably generated near error-free genome assemblies for eukaryotes.ResultsWhilst working towards improved data sets and fully automated pipelines, assembly evaluation a… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
3
2

Relationship

4
1

Authors

Journals

citations
Cited by 13 publications
(12 citation statements)
references
References 43 publications
0
12
0
Order By: Relevance
“…2018 ). The final draft assembly was manually curated to remove contaminants, correct structural integrity, assemble and identify chromosome-level scaffolds ( Figure 2 ) based on gEVAL analyses and 3 D-chromosomal interactions ( Howe et al 2021 ). The remaining haplotype duplication was purged manually into an alternative haplotype genome.…”
Section: Methodsmentioning
confidence: 99%
“…2018 ). The final draft assembly was manually curated to remove contaminants, correct structural integrity, assemble and identify chromosome-level scaffolds ( Figure 2 ) based on gEVAL analyses and 3 D-chromosomal interactions ( Howe et al 2021 ). The remaining haplotype duplication was purged manually into an alternative haplotype genome.…”
Section: Methodsmentioning
confidence: 99%
“…These interventions indicate that even with current state-of-the-art assembly algorithms, curation is essential for completing high-quality reference assemblies and for providing iterative feedback to improve assembly algorithms. A further description of our curation approach and analyses of VGP genomes are presented elsewhere 25 . 2) for less aggressive contig joining.…”
Section: Curation Is Needed For a High-quality Referencementioning
confidence: 99%
“…3 | Flow charts of assembly pipelines used to generate high-quality assemblies in this study. a, Standard VGP assembly pipeline when sequencing data of one individual, that generated the highest quality assemblies: generate primary pseudo-haplotype and alternate haplotype contigs with CLR using FALCON-Unzip 17 ; generate scaffolds with linked reads using Scaff10x 74 ; break mis-joins and further scaffold with optical maps using Solve 87 ; generate chromosome-scale scaffolds with Hi-C reads using Salsa2 79 ; fill in gaps and polish base-errors with CLR using Arrow (Pacific BioSciences); perform two or more rounds of short-read polishing with linked reads using FreeBayes 85 ; and perform expert manual curation to correct potential assembly errors using gEVAL 25,95 In this example, the alternate (alt) sequence was built at higher quality, attracting all linked-reads for polishing. The matching locus in the primary (pri) assembly was left unpolished, resulting in frameshift errors in the TLK1 gene.…”
Section: Reporting Summarymentioning
confidence: 99%
“…The assembly was decontaminated and manually curated using the gEVAL browser (Chow et al 2016;Howe et al 2021), resulting in 521 corrections (breaks, joins and removal of erroneously duplicated sequence). HiGlass (Kerpedjiev et al 2018) and PretextView (https://github.com/wtsihpag/PretextView) were used to visualize and rearrange the genome using Hi-C data, and PretextSnapshot (https://github.com/wtsi-hpag/PretextSnapshot) was used to generate an image of the Hi-C contact map.…”
Section: Curationmentioning
confidence: 99%