2019
DOI: 10.1101/772103
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

centroFlye: Assembling Centromeres with Long Error-Prone Reads

Abstract: Although variations in centromeres have been linked to cancer and infertility, centromeres still represent the "dark matter of the human genome" and remain an enigma for both biomedical and evolutionary studies. Since centromeres have withstood all previous attempts to develop an automated tool for their assembly and since their assembly using short reads is viewed as intractable, recent efforts attempted to manually assemble centromeres using long error-prone reads. We describe the centroFlye algorithm for ce… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 15 publications
(26 citation statements)
references
References 50 publications
(45 reference statements)
0
26
0
Order By: Relevance
“…Despite the improved performance of TE detection algorithms for short‐read sequencing data, it is still difficult to detect certain subsets of TE insertions, including those that accompany complex genomic rearrangements or fall into repetitive genomic regions. Genomic regions with existing TE copies from the same TE subfamily, or centromeric or telomeric regions with many gaps, are particularly challenging for TE detection due to limited short read mappability (Bzikadze & Pevzner, 2019; Jain et al., 2018; Miga et al., 2019). Recent advances in long‐read sequencing, notably PacBio and Oxford Nanopore (ONT) technologies, create ∼10‐ to 15‐Kbp‐long reads.…”
Section: The Use Of Long Reads For Comprehensive Te Insertion Detectionmentioning
confidence: 99%
“…Despite the improved performance of TE detection algorithms for short‐read sequencing data, it is still difficult to detect certain subsets of TE insertions, including those that accompany complex genomic rearrangements or fall into repetitive genomic regions. Genomic regions with existing TE copies from the same TE subfamily, or centromeric or telomeric regions with many gaps, are particularly challenging for TE detection due to limited short read mappability (Bzikadze & Pevzner, 2019; Jain et al., 2018; Miga et al., 2019). Recent advances in long‐read sequencing, notably PacBio and Oxford Nanopore (ONT) technologies, create ∼10‐ to 15‐Kbp‐long reads.…”
Section: The Use Of Long Reads For Comprehensive Te Insertion Detectionmentioning
confidence: 99%
“…We benchmarked various approaches to string decomposition using centromeric reads from chromosome X since this centromere (referred to as cenX) was recently assembled, thus providing the ground truth for our benchmarking. This benchmarking utilized 2,680 reads (total read length 132,9 Mb) that were recruited to cenX in Bzikadze and Pevzner, 2019). monomers and HOR sequences on cenX.…”
Section: Datasetmentioning
confidence: 99%
“…monomers and HOR sequences on cenX. We used the cenX HOR consensus sequence DXZ1* derived in Bzikadze and Pevzner, 2019. Appendix " Extracting monomers from DXZ1* " describes decomposition of DXZ1* into twelve monomers using Alpha-CENTAURI (Sevim et al, 2016).…”
Section: Datasetmentioning
confidence: 99%
See 2 more Smart Citations