2020
DOI: 10.1101/2020.02.04.933523
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Haplotype Threading: Accurate Polyploid Phasing from Long Reads

Abstract: Resolving genomes at haplotype level is crucial for understanding the evolutionary history of polyploid species and for designing advanced breeding strategies. As a highly complex computational problem, polyploid phasing still presents considerable challenges, especially in regions of collapsing haplotypes.We present W H , a novel two-stage approach that addresses these challenges by (i) clustering reads using a position-dependent scoring function and (ii) threading the haplotypes through the clusters by dynam… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
29
1

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 23 publications
(30 citation statements)
references
References 36 publications
0
29
1
Order By: Relevance
“…Some methods currently exist to phase polyploids using long read data such as WhatsHap polyphase 20 , as well as other methods which were mostly designed to work with short read sequencing data but can sometimes use long reads as input 21,22,23 . Because nPhase is a phasing tool that leverages the linking power of long reads to achieve its high accuracy and contiguity metrics, we did not benchmark it against tools that rely exclusively on short reads for phasing, since these are inherently limited by the size of their reads.…”
Section: Benchmarking Nphase Against Other Polyploid Phasing Toolsmentioning
confidence: 99%
See 1 more Smart Citation
“…Some methods currently exist to phase polyploids using long read data such as WhatsHap polyphase 20 , as well as other methods which were mostly designed to work with short read sequencing data but can sometimes use long reads as input 21,22,23 . Because nPhase is a phasing tool that leverages the linking power of long reads to achieve its high accuracy and contiguity metrics, we did not benchmark it against tools that rely exclusively on short reads for phasing, since these are inherently limited by the size of their reads.…”
Section: Benchmarking Nphase Against Other Polyploid Phasing Toolsmentioning
confidence: 99%
“…For polyploids, however, a variable position can be one of two or up to six possible states (all four bases, a deletion or an insertion) and this deduction is no longer possible, rendering the task of phasing significantly more complex. Some methods currently exist to phase polyploids but mainly using short read sequencing and leading to a low accuracy and contiguity phasing 20,21,22,23 .…”
Section: Introductionmentioning
confidence: 99%
“…Due to the increased prevalence of long-read data from Oxford Nanopore or PacBio, newer methods taking advantage of the longer-range correlations accessible through long-read data have been proposed [18,19]. Unfortunately, because the error profiles of long-read technologies differ considerably from Illumina shortreads (e.g.…”
Section: Related Workmentioning
confidence: 99%
“…Unfortunately, a good MEC score may not imply a good phasing when errors are present [23]. This shortcoming is further exacerbated in the polyploid setting because similar haplotypes may be clustered together since the MEC model does not consider coverage; this phenomenon is known as genome collapsing [19]. Thus, although the MEC model can be applied to the polyploid setting, it may be suboptimal; however, there is yet to be an alternative commonly agreed upon formulation of the polyploid phasing problem.…”
Section: Related Workmentioning
confidence: 99%
“…Siragusa et al devised a new algorithm based on the MFR model, which uses integer linear programming [10]. Polyphase, part of WhatsHap [11], is a method for polyploid haplotyping developed for short and long reads. Reads are clustered based on a position-based score, and haplotypes are threaded by dynamic programming.…”
Section: Introductionmentioning
confidence: 99%