H-PoP and H-PoPG: heuristic partitioning algorithms for single individual haplotyping of polyploids

Xie, Minzhu; Wu, Qiong; Wang, Jianxin; Jiang, Tao

doi:10.1093/bioinformatics/btw537

Cited by 56 publications

(104 citation statements)

References 32 publications

Supporting

Mentioning

104

Contrasting

Order By: Relevance

“…We calculate this criterion for each haplotype block and report the average. The vector error rate is calculated by finding the minimum number of switches needed in haplotype segments in order to match • to ; this number is then divided by the haplotype length [7,15].…”

Section: Performance Assessmentmentioning

confidence: 99%

“…SDhaP [6] solves a correlation clustering problem using a gradient method to estimate the haplotypes. H-PoP [7], a heuristic algorithm, solves a combinatorial optimization problem called "polyploid balanced optimal partition". Another approach is to use the minimum fragment removal (MFR) model in which conflicting fragments (due to erroneous reads) are removed.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Hap10: reconstructing accurate and long polyploid haplotypes using linked reads

Majidian

Kahaei

Ridder

2020

Preprint

View full text Add to dashboard Cite

Background: Haplotype information is essential for many genetic and genomic analyses, including genotype-phenotype associations in human, animals and plants. Haplotype assembly is a method for reconstructing haplotypes from DNA sequencing reads. By the advent of new sequencing technologies, new algorithms are needed to ensure long and accurate haplotypes. While a few linked-read haplotype assembly algorithms are available for diploid genomes, there are no algorithms yet for polyploids. Results: The first haplotyping algorithm designed for 10X linked reads generated from a polyploid genome is presented, built on a typical short-read haplotyping method, SDhaP. Using the input aligned reads and called variants, the haplotype-relevant information is extracted. Next, reads with the same barcodes are combined to produce molecule-specific fragments. Then, these fragments are clustered into strongly connected components which are then used as input of a haplotype assembly core in order to estimate accurate and long haplotypes. Conclusions: Hap10 is a novel algorithm for haplotype assembly of polyploid genomes using linked reads. The performance of the algorithms is evaluated in a number of simulation scenarios and its applicability is demonstrated on a real dataset of sweet potato.

show abstract

Section: Performance Assessmentmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Hap10: reconstructing accurate and long polyploid haplotypes using linked reads

Majidian

Kahaei

Ridder

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…The vast majority of existing haplotype assembly methods attempt to remove the aforementioned ambiguity by altering or even discarding the data, leading to minimum SNP removal (Lancia 2001), maximum fragments cut (Duitama 2010), and minimum error correction (MEC) score optimization criteria. Majority of haplotype assembly methods developed in recent years are focused on optimizing the MEC score, i.e., determining the smallest possible number of nucleotides in sequencing reads that should be altered such that the resulting dataset is consistent with having originated from k haplotypes (k denotes the ploidy of an organism) (Xie 2016;Pirola 2015;Kuleshov 2014;Patterson 2015;Bonizzoni 2016). These include the branch-and-bound scheme (Wang 2005), an integer linear programming formulation in (Chen 2013), and a dynamic programming framework in (Kuleshov 2014).…”

Section: Introductionmentioning

confidence: 99%

“…Among the aforementioned methods, only HapCompass (Aguiar 2012), SD-haP (Das 2015) and BP (Puljiz 2016) are capable of solving the haplotype assembly problem for k > 2. Other techniques that can handle reconstruction of haplotypes for both diploid and polyploid genomes include a Bayesian method HapTree (Berger 2014), a dynamic programming method H-PoP (Xie 2016) shown to be more accurate than the techniques in (Aguiar 2012;Berger 2014;Das 2015), and the matrix factorization schemes in (Cai 2016;Hashemi 2018).…”

Section: Introductionmentioning

confidence: 99%

A Graph Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction

Vikalo

2019

Preprint

View full text Add to dashboard Cite

Reconstructing components of a genomic mixture from data obtained by means of DNA sequencing is a challenging problem encountered in a variety of applications including single individual haplotyping and studies of viral communities. Highthroughput DNA sequencing platforms oversample mixture components to provide massive amounts of reads whose relative positions can be determined by mapping the reads to a known reference genome; assembly of the components, however, requires discovery of the reads' origin -an NP-hard problem that the existing methods struggle to solve with the required level of accuracy. In this paper, we present a learning framework based on a graph auto-encoder designed to exploit structural properties of sequencing data. The algorithm is a neural network which essentially trains to ignore sequencing errors and infers the posterior probabilities of the origin of sequencing reads. Mixture components are then reconstructed by finding consensus of the reads determined to originate from the same genomic component. Results on realistic synthetic as well as experimental data demonstrate that the proposed framework reliably assembles haplotypes and reconstructs viral communities, often significantly outperforming state-ofthe-art techniques.

show abstract

“…Examples for diploid haplotype assembly are WhatsHap (Patterson et al, 2015), Phaser (Castel et al, 2016), Hap-Cut2 (Edge et al, 2017), ProbHap (Kuleshov, 2014) and HapCol (Pirola et al, 2016). Examples for polyploid haplotype assembly are Hap-Compass (Aguiar and Istrail, 2012), HapTree (Berger et al, 2014), SDhaP (Das and Vikalo, 2015), and H-PoP (Xie et al, 2016). The disadvantage of haplotype assembly programs is that they depend on high-quality reference sequence as a backbone, and, in addition, also on external variant call sets, which are major external factors that can introduce non-negligible biases.…”

Section: Introductionmentioning

confidence: 99%

Overlap graph-based generation of haplotigs for diploids and polyploids

Baaijens

Schoenhuth

2018

Preprint

View full text Add to dashboard Cite

Haplotype aware genome assembly plays an important role in genetics, medicine, and various other disciplines, yet generation of haplotype-resolved de novo assemblies remains a major challenge. Beyond distinguishing between errors and true sequential variants, one needs to assign the true variants to the different genome copies. Recent work has pointed out that the enormous quantities of traditional NGS read data have been greatly underexploited in terms of haplotig computation so far, which reflects that methodology for reference independent haplotig computation has not yet reached maturity. We present POLYTE (POLYploid genome fitTEr) as a new approach to de novo generation of haplotigs for diploid and polyploid genomes. Our method follows an iterative scheme where in each iteration reads or contigs are joined, based on their interplay in terms of an underlying haplotype-aware overlap graph. Along the iterations, contigs grow while preserving their haplotype identity. Benchmarking experiments on both real and simulated data demonstrate that POLYTE establishes new standards in terms of error-free reconstruction of haplotype-specific sequence. As a consequence, POLYTE outperforms state-of-the-art approaches in various relevant aspects, where advantages become particularly distinct in polyploid settings. POLYTE is freely available as part of the HaploConduct package at https://github.com/HaploConduct/HaploConduct, implemented in Python and C++.

show abstract

H-PoP and H-PoPG: heuristic partitioning algorithms for single individual haplotyping of polyploids

Cited by 56 publications

References 32 publications

Hap10: reconstructing accurate and long polyploid haplotypes using linked reads

Hap10: reconstructing accurate and long polyploid haplotypes using linked reads

A Graph Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction

Overlap graph-based generation of haplotigs for diploids and polyploids

Contact Info

Product

Resources

About