2013
DOI: 10.1093/sysbio/syt044
|View full text |Cite
|
Sign up to set email alerts
|

Minimizing the Average Distance to a Closest Leaf in a Phylogenetic Tree

Abstract: When performing an analysis on a collection of molecular sequences, it can be convenient to reduce the number of sequences under consideration while maintaining some characteristic of a larger collection of sequences. For example, one may wish to select a subset of high-quality sequences that represent the diversity of a larger collection of sequences. One may also wish to specialize a large database of characterized "reference sequences" to a smaller subset that is as close as possible on average to a collect… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
19
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(19 citation statements)
references
References 21 publications
0
19
0
Order By: Relevance
“…An amplicon-specific profile HMM was created from an alignment of representative sequences from multiple subtypes. For each subject and amplicon, 20 reference sequences were selected by placing 454 reads on a tree of candidate reference sequences [34] and minimizing the average distance to the closest leaf [35]. These reference sequences, representatives from subtypes common to the region, and 454 reads were aligned to the HMM using hmmalign [36] and non-consensus columns removed.…”
Section: Methodsmentioning
confidence: 99%
“…An amplicon-specific profile HMM was created from an alignment of representative sequences from multiple subtypes. For each subject and amplicon, 20 reference sequences were selected by placing 454 reads on a tree of candidate reference sequences [34] and minimizing the average distance to the closest leaf [35]. These reference sequences, representatives from subtypes common to the region, and 454 reads were aligned to the HMM using hmmalign [36] and non-consensus columns removed.…”
Section: Methodsmentioning
confidence: 99%
“…Let H be the set of n haplotypes, and let X be the selected k -element subset of H . The objective is then to find X such that the branch-length distance from a randomly chosen haplotype in H to its closest neighboring haplotype in X is minimized over all possible k -element subsets of H ( Matsen et al 2013 ). Note that because the haplotypes in X are also in H , each of these haplotypes is its own closest neighbor, and we can equivalently consider either H or .…”
Section: Methodsmentioning
confidence: 99%
“…In a detailed study of ADCL, Matsen et al (2013) demonstrated that unlike when choosing the subset that maximizes PD, the greedy algorithm need not give rise to the globally optimal ADCL solution. It is therefore necessary to produce alternative algorithms that seek to minimize ADCL.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations