2021
DOI: 10.1186/s12859-020-03939-y
|View full text |Cite
|
Sign up to set email alerts
|

HapSolo: an optimization approach for removing secondary haplotigs during diploid genome assembly and scaffolding

Abstract: Background Despite marked recent improvements in long-read sequencing technology, the assembly of diploid genomes remains a difficult task. A major obstacle is distinguishing between alternative contigs that represent highly heterozygous regions. If primary and secondary contigs are not properly identified, the primary assembly will overrepresent both the size and complexity of the genome, which complicates downstream analysis such as scaffolding. Results … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 9 publications
(11 citation statements)
references
References 47 publications
0
11
0
Order By: Relevance
“…We generated 81.2 gigabases, equivalent to roughly 90 × coverage, based on the expected 1C genome size of 896 Mb ( Arumuganathan and Earle 1991 ). We assembled PacBio SMRT reads using Canu (v 2.1), producing a genome of 1,456 Mb with 5,122 contigs, and then applied HapSolo ( Solares et al 2021 ) to remove putative secondary contigs (or haplotigs). The Canu + HapSolo (C + H) genome resulted in a primary assembly of 1,032 Mb, a longest contig of 17 Mb, a BUSCO score of 91%, and an N50 of 3.37 Mb ( Supplementary Table 1 ).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We generated 81.2 gigabases, equivalent to roughly 90 × coverage, based on the expected 1C genome size of 896 Mb ( Arumuganathan and Earle 1991 ). We assembled PacBio SMRT reads using Canu (v 2.1), producing a genome of 1,456 Mb with 5,122 contigs, and then applied HapSolo ( Solares et al 2021 ) to remove putative secondary contigs (or haplotigs). The Canu + HapSolo (C + H) genome resulted in a primary assembly of 1,032 Mb, a longest contig of 17 Mb, a BUSCO score of 91%, and an N50 of 3.37 Mb ( Supplementary Table 1 ).…”
Section: Resultsmentioning
confidence: 99%
“…Once assembled, polishing was performed with 2 passes of PacBio GenomicConsensus v2.33, followed by 2 passes with Pilon v1.23 ( Walker et al 2014 ) using default parameters and 19 × coverage of Gwen short-read Illumina sequencing data. HapSolo v0.1 ( Solares et al 2021 ), which identifies and removes alternative haplotypes, was then run on the assembly with default parameters and 50,000 iterations, producing the Canu + Hapsolo (C + H) assembly. Scaffolding was based on a Gwen × Fuerte genetic map ( Ashworth et al 2019 ) by aligning the C + H assembly using NCBI BLAST v2.2.31+ ( Altschul et al 1990 ).…”
Section: Methodsmentioning
confidence: 99%
“…An alternative method would be to sequence multiple individuals from a highly inbred homozygous line. Many species cannot be maintained in a laboratory setting, however, making the establishment of inbred lines for these taxa challenging or impossible (Solares et al, 2021). It is also now well-recognized that inbreeding can significantly reduce genome size (Price, 1976; Fierst et al, 2015; Roessler et al, 2019), making these lines poor representatives of the species as whole.…”
Section: Discussionmentioning
confidence: 99%
“…A highly contiguous and annotated genome assembly can be an invaluable tool for population genomic inference (Ellegren, 2014) enabling both genome‐scale and reduced representation methods for sequencing populations (Ekblom & Galindo, 2010; Fonseca et al, 2016; Matz, 2017). However, even as the cost of second‐ and third‐generation sequencing continues to drop (van Dijk et al, 2014), many challenges remain for assembling references for non‐model organisms (Roach et al, 2018; Solares et al, 2021). One of the greatest hurdles to accurate genome assembly is heterozygosity (Kajitani et al, 2014; Safonova et al, 2015; Vinson et al, 2005), a hallmark of many wild plant and animal species, including most insects and marine invertebrates.…”
Section: Introductionmentioning
confidence: 99%