2008
DOI: 10.1109/tcbb.2007.70223
|View full text |Cite
|
Sign up to set email alerts
|

Highly Scalable Genotype Phasing by Entropy Minimization

Abstract: Abstract-A Single Nucleotide Polymorphism (SNP) is a position in the genome at which two or more of the possible four nucleotides occur in a large percentage of the population. SNPs account for most of the genetic variability between individuals, and mapping SNPs in the human population has become the next high-priority in genomics after the completion of the Human Genome project. In diploid organisms such as humans, there are two non-identical copies of each autosomal chromosome. A description of the SNPs in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
26
0

Year Published

2008
2008
2012
2012

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(27 citation statements)
references
References 25 publications
(37 reference statements)
1
26
0
Order By: Relevance
“…Two limitations of previous uses of HMMs in this context have been the relatively slow training based on genotype data and the inability to exploit available pedigree information. We overcome these limitations by training our HMM using haplotypes inferred by the pedigree-aware phasing algorithm of Gusev et al (2008), based on entropy minimization. Becker et al (2006) use maximum phasing probability of a trio genotype as the likelihood function whose sensitivity to single SNP genotype deletions signals potential errors.…”
Section: Kennedy Et Almentioning
confidence: 99%
See 1 more Smart Citation
“…Two limitations of previous uses of HMMs in this context have been the relatively slow training based on genotype data and the inability to exploit available pedigree information. We overcome these limitations by training our HMM using haplotypes inferred by the pedigree-aware phasing algorithm of Gusev et al (2008), based on entropy minimization. Becker et al (2006) use maximum phasing probability of a trio genotype as the likelihood function whose sensitivity to single SNP genotype deletions signals potential errors.…”
Section: Kennedy Et Almentioning
confidence: 99%
“…Since EM-based training is generally slow and cannot be easily modified to take advantage of phase information that can be inferred from available family relationships, we adopted the following two-step approach for training our HMM. First, we use the highly scalable ENT algorithm of Gusev et al (2008) to infer haplotypes for all individuals in the sample based on entropy minimization. ENT can handle genotypes related by arbitrary pedigrees, and has been shown to yield high phasing accuracy as measured by the so-called switching error, which implies that inferred haplotypes are locally correct with very high probability.…”
Section: Hidden Markov Modelmentioning
confidence: 99%
“…It is therefore a special case of the minimum entropy set cover problem. Experimental results derived from this work were proposed by Bonizzoni et al [3], and Gusev, Mȃndoiu, and Paşaniuc [20].…”
Section: Minimum Entropy Set Covermentioning
confidence: 75%
“…However, since the efficient score is valid even when this working model is misspecified, we are able to choose estimators that are computationally simple, secure in the knowledge that misspecification will not affect the validity of the test. One approach, which we utilize in the simulation experiment below, is to estimate ( h ͉ g , e ) by computing full-chromosome diplotypes for each study participant using a fast phasing program (for example, ent [15] ), and then letting ( h ͉ g , e ) be the degenerate distribution that puts all mass on the imputed diplotype.…”
Section: H H I a I H H I A X H I H H I H Hmentioning
confidence: 99%