2010
DOI: 10.1186/1748-7188-5-21
|View full text |Cite
|
Sign up to set email alerts
|

Sequence embedding for fast construction of guide trees for multiple sequence alignment

Abstract: BackgroundThe most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments.ResultsIn this paper, we have tested variations on a class… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
77
0
3

Year Published

2013
2013
2021
2021

Publication Types

Select...
7
2

Relationship

2
7

Authors

Journals

citations
Cited by 102 publications
(80 citation statements)
references
References 29 publications
0
77
0
3
Order By: Relevance
“…We measured the proportion of correctly aligned columns out of all aligned columns in the reference sequences [Total Column (TC) score] of the 12 sequences, embedded in the larger datasets. This type of analysis is widely used and is the basis of the HomFam alignment benchmark system (12).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We measured the proportion of correctly aligned columns out of all aligned columns in the reference sequences [Total Column (TC) score] of the 12 sequences, embedded in the larger datasets. This type of analysis is widely used and is the basis of the HomFam alignment benchmark system (12).…”
Section: Resultsmentioning
confidence: 99%
“…PartTree (10) groups the sequences quickly into clusters and then clusters the clusters, allowing very large guide trees to be made but at the expense of some accuracy, compared with the default Mafft program on which it is based. Clustal Omega (11) uses the mBed algorithm (12) to cluster the sequences on the basis of a small number of "seed" sequences. For N sequences, S seeds are used where S is typically proportional to logðNÞ.…”
mentioning
confidence: 99%
“…To perform sequence alignments, NDC80 from S. cerevisiae and its orthologs in Saccharomyces bayanus, S. kudriavzevii, S. mikatae (Scannell et al 2011), Lachancea (Kluyveromyces) thermotolerans, Kluyveromyces lactis, and Debaryomyces hansenii (http://genolevures.org) were translated using Transeq (Rice et al 2000) and then aligned using Clustal-O (Blackshields et al 2010). The similarity score was plotted for each position using Plotcon (Rice et al 2000) with a window size of 21 bp.…”
Section: Illumina Sequencingmentioning
confidence: 99%
“…Although at present these trees do not have support measures associated with them, they offer valuable preliminary data analysis and could offer a new way of estimating guide trees for difficult phylogenetic problems. We hope that future improvements to PaHMM-Tree will help to alleviate some of its computational limitations, for example by implementing the mBed algorithm to reduce the number of distance calls, allowing it to process larger data sets (Blackshields et al 2010), or through the implementation of anchor points using suffix trees in the pairHMMs, allowing it to work with longer sequences more quickly (Gusfield 1997 …”
Section: Discussionmentioning
confidence: 99%