Efficient Reconstruction of Haplotype Structure via Perfect Phylogeny

Running head Dictionary Model for HaplotypesKeywords linkage disequilibrium; haplotype blocks; minimum description length; forward and backwards algorithms; EM algorithm. Corresponding author ABSTRACTWe propose a dictionary model for haplotypes. According to the model, a haplotype is constructed by randomly concatenating haplotype segments from a given dictionary of segments.A haplotype block is defined as a set of haplotype segments that begin and end with the same pair of markers. In this framework, haplotype blocks can overlap, and the model provides a setting for testing the accuracy of simpler models invoking only nonoverlapping blocks. Each haplotype segment in a dictionary has an assigned probability and alternate spellings that account for genotyping errors and mutation. The model also allows for missing data, unphased genotypes, and prior distribution of parameters. Likelihood evaluations rely on forward and backward recurrences similar to the ones encountered in hidden Markov models. Parameter estimation is carried out with an EM algorithm. The search for the optimal dictionary is a particularly difficult because of the variable dimension of the model space. We define a minimum description length criteria to evaluate each dictionary and use a combination of greedy search and careful initialization to select a best dictionary for a given data set. Application of the model to simulated data gives encouraging results. In a real data set, we are able to reconstruct a parsimonious dictionary that captures patterns of linkage disequilibrium well.1

show abstract

“…Focusing on the detectability of ancestral recombination has motivated interested developments in computer science Eskin et al, 2003).…”

Section: Resultsmentioning

confidence: 99%

Reconstructing Ancestral Haplotypes with a Dictionary Model

Ayers

Sabatti

Lange

2006

Journal of Computational Biology

View full text Add to dashboard Cite

show abstract

“…If not, using our freedom of labeling, we convert the data so that it contains the same information with the all zeros taxa (see section 2.2 of Eskin et al [4] for details). We now remove any character that contains only one state.…”

Section: Lemma 1 [8] the Most Parsimonious Phylogeny For Input I Is mentioning

confidence: 99%

Simple Reconstruction of Binary Near-Perfect Phylogenetic Trees

Sridhar

Dhamdhere

Blelloch

et al. 2006

Computational Science – ICCS 2006

Self Cite

View full text Add to dashboard Cite

Abstract.We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally efficient but not robust to imperfections in real data. A near-perfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number q of additional mutations. In this paper, we develop an algorithm for constructing optimal phylogenies and provide empirical evidence of its performance. The algorithm runs in time O((72κ) q nm + nm 2 ) where n is the number of taxa, m is the number of characters and κ is the number of characters that share four gametes with some other character. This is fixed parameter tractable when q and κ are constants and significantly improves on the previous asymptotic bounds by reducing the exponent to q. Furthermore, the complexity of the previous work makes it impractical and in fact no known implementation of it exists. We implement our algorithm and demonstrate it on a selection of real data sets, showing that it substantially outperforms its worstcase bounds and yields far superior results to a commonly used heuristic method in at least one case. Our results therefore describe the first practical phylogenetic tree reconstruction algorithm that finds guaranteed optimal solutions while being easily implemented and computationally feasible for data sets of biologically meaningful size and complexity.

show abstract

“…Haplotype Resolution via Perfect Phylogeny The complete algorithm as well as proofs of correctness are given in [3]. Here we give a summary of the algorithm.…”

Section: A Appendixmentioning

confidence: 99%

Large Scale Recovery of Haplotypes from Genotype Data Using Imperfect Phylogeny

Halperin

2004

Computational Methods for SNPs and Haplotype Inference

Self Cite

View full text Add to dashboard Cite

Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To characterize an individual's variation, we must determine an individual's haplotype or which nucleotide base occurs at each position of these common SNPs for each chromosome. In this paper, we present results for a highly accurate method for haplotype resolution from genotype data. Our method leverages a new insight into the underlying structure of haplotypes which shows that SNPs are organized in highly correlated "blocks". The majority of individuals have one of about four common haplotypes in each block. Our method partitions the SNPs into blocks and for each block, we predict the common haplotypes each individual's haplotype. We evaluate our method over biological data. Our method predicts the common haplotypes perfectly and has a very low error rate (0.47%) when taking into account the predictions for the uncommon haplotypes.The algorithm is available via webserver at

show abstract

Efficient Reconstruction of Haplotype Structure via Perfect Phylogeny

Cited by 102 publications

References 15 publications

Reconstructing Ancestral Haplotypes with a Dictionary Model

Reconstructing Ancestral Haplotypes with a Dictionary Model

Simple Reconstruction of Binary Near-Perfect Phylogenetic Trees

Large Scale Recovery of Haplotypes from Genotype Data Using Imperfect Phylogeny

Contact Info

Product

Resources

About