2015
DOI: 10.1186/s13059-015-0688-z
|View full text |Cite
|
Sign up to set email alerts
|

Ultra-large alignments using phylogeny-aware profiles

Abstract: Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propos… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
179
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 130 publications
(181 citation statements)
references
References 33 publications
(79 reference statements)
2
179
0
Order By: Relevance
“…Numerous MSA methods have been developed, but only a few of these can analyze large datasets, and even fewer have been demonstrated to have good accuracy beyond a few hundred sequences [4]. The impact of multiple sequence alignment on downstream analyses is known to be substantial, with errors in multiple sequence alignment producing increased error rates in phylogeny estimation, false detection of positive selection, difficulties in detecting active sites in proteins, etc.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Numerous MSA methods have been developed, but only a few of these can analyze large datasets, and even fewer have been demonstrated to have good accuracy beyond a few hundred sequences [4]. The impact of multiple sequence alignment on downstream analyses is known to be substantial, with errors in multiple sequence alignment producing increased error rates in phylogeny estimation, false detection of positive selection, difficulties in detecting active sites in proteins, etc.…”
Section: Introductionmentioning
confidence: 99%
“…Our group has developed several techniques [4, 12, 17, 18] to improve the scalability of multiple sequence alignment methods to large datasets, of which PASTA [18] and UPP [4] provide the largest improvements. PASTA is an iterative divide-and-conquer method for co-estimating trees and alignments, in which each iteration begins with a maximum likelihood tree computed in the previous iteration, and then uses the tree to partition the sequences into small subsets that are local within the tree.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…This is also similar to UPP (Nguyen et al , 2015) in applying an accurate method to randomly-selected sequences. UPP uses PASTA (Mirarab et al , 2015), which is a combination of L-INS-i option of MAFFT, OPAL (Wheeler and Kececioglu, 2007) and FastTree (Price et al , 2010), to align core sequences, and then uses HMMalign (Finn et al , 2011) to add the remaining sequences to the core alignment.…”
Section: Resultsmentioning
confidence: 90%
“…Using a newer version (7.294), we reexamined the difference between normal guide trees and random chains, using two different benchmark criteria. For reference, other methods designed for large data, Clustal Omega version 1.2.1 (Sievers et al , 2011) and UPP version 2.0 (Nguyen et al , 2015) with several different options were included into the comparison.…”
Section: Methodsmentioning
confidence: 99%