2016
DOI: 10.1089/cmb.2015.0220
|View full text |Cite
|
Sign up to set email alerts
|

On the Minimum Error Correction Problem for Haplotype Assembly in Diploid and Polyploid Genomes

Abstract: In diploid genomes, haplotype assembly is the computational problem of reconstructing the two parental copies, called haplotypes, of each chromosome starting from sequencing reads, called fragments, possibly affected by sequencing errors. Minimum error correction (MEC) is a prominent computational problem for haplotype assembly and, given a set of fragments, aims at reconstructing the two haplotypes by applying the minimum number of base corrections. MEC is computationally hard to solve, but some approximation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
39
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 36 publications
(39 citation statements)
references
References 45 publications
0
39
0
Order By: Relevance
“…The vast majority of existing haplotype assembly methods attempt to remove the aforementioned ambiguity by altering or even discarding the data, leading to minimum SNP removal (Lancia 2001), maximum fragments cut (Duitama 2010), and minimum error correction (MEC) score optimization criteria. Majority of haplotype assembly methods developed in recent years are focused on optimizing the MEC score, i.e., determining the smallest possible number of nucleotides in sequencing reads that should be altered such that the resulting dataset is consistent with having originated from k haplotypes (k denotes the ploidy of an organism) (Xie 2016;Pirola 2015;Kuleshov 2014;Patterson 2015;Bonizzoni 2016). These include the branch-and-bound scheme (Wang 2005), an integer linear programming formulation in (Chen 2013), and a dynamic programming framework in (Kuleshov 2014).…”
Section: Introductionmentioning
confidence: 99%
“…The vast majority of existing haplotype assembly methods attempt to remove the aforementioned ambiguity by altering or even discarding the data, leading to minimum SNP removal (Lancia 2001), maximum fragments cut (Duitama 2010), and minimum error correction (MEC) score optimization criteria. Majority of haplotype assembly methods developed in recent years are focused on optimizing the MEC score, i.e., determining the smallest possible number of nucleotides in sequencing reads that should be altered such that the resulting dataset is consistent with having originated from k haplotypes (k denotes the ploidy of an organism) (Xie 2016;Pirola 2015;Kuleshov 2014;Patterson 2015;Bonizzoni 2016). These include the branch-and-bound scheme (Wang 2005), an integer linear programming formulation in (Chen 2013), and a dynamic programming framework in (Kuleshov 2014).…”
Section: Introductionmentioning
confidence: 99%
“…Beginning with Hapcompass [1], there has been some work on polyploid phasing using algorithms based on branch-and-extend [5], belief propagation [32] and semi-definite programming [14]. In a recent theoretical work [7], the hardness of optimizing the MEC for S > 2 has also been proven, indicating that algorithms for this problem need to be necessarily approximate or tailored to some assumptions. A major drawback of existing works is that they consider only S = 3 , 4 and none have been developed, optimized, or tested for the high ploidy that is encountered in segmental duplications, where S can be potentially larger than 10, and to the low error-rate in Illumina sequencers.…”
Section: Introductionmentioning
confidence: 99%
“…For this reason, the vast majority of haplotype assembly techniques attempts to remove the aforementioned ambiguities by either discarding or altering sequencing data; this has led to the minimum fragment removal, minimum SNP removal [26], maximum fragments cut [16], and minimum error correction formulations of the assembly problem [29]. Most of the recent haplotype assembly methods (see, e.g., [7,25,31,32,40]) focus on the minimum error correction (MEC) formulation where the goal is to nd the smallest number of nucleotides in reads that need to be changed so that any read partitioning ambiguities would be resolved. It has been shown that nding optimal solution to the MEC formulation of the haplotype assembly problem is NP-hard [7,10,26].…”
Section: Introductionmentioning
confidence: 99%
“…Most of the recent haplotype assembly methods (see, e.g., [7,25,31,32,40]) focus on the minimum error correction (MEC) formulation where the goal is to nd the smallest number of nucleotides in reads that need to be changed so that any read partitioning ambiguities would be resolved. It has been shown that nding optimal solution to the MEC formulation of the haplotype assembly problem is NP-hard [7,10,26]. In [39], the authors used a branch-and-bound scheme to minimize the MEC objective over the space of reads; to reduce the search space, they relied on a bound on the objective obtained by a random partition of the reads.…”
Section: Introductionmentioning
confidence: 99%