Edgar Garriga scite author profile

Multiple sequence alignments (MSAs) are used for structural1,2 and evolutionary predictions1,2, but the complexity of aligning large datasets requires the use of approximate solutions3, including the progressive algorithm4. Progressive MSA methods start by aligning the most similar sequences and subsequently incorporate the remaining sequences, from leaf-to-root, based on a guide-tree. Their accuracy declines substantially as the number of sequences is scaled up5. We introduce a regressive algorithm that enables MSA of up to 1.4 million sequences on a standard workstation and substantially improves accuracy on datasets larger than 10,000 sequences. Our regressive algorithm works the other way around to the progressive algorithm and begins by aligning the most dissimilar sequences. It uses an efficient divide-and-conquer strategy to run third-party alignment methods in linear time, regardless of their original complexity. Our approach will enable analyses of extremely large genomic datasets such as the recently announced Earth BioGenome Project, which comprises 1.5 million eukaryotic genomes6.

show abstract

Multiple Sequence Alignment Computation Using the T-Coffee Regressive Algorithm Implementation

Garriga

Tommaso

Magis

et al. 2020

View full text Add to dashboard Cite

Towards the accurate alignment of over a million protein sequences: Current state of the art

Santus

Garriga

Deorowicz

et al. 2023

Current Opinion in Structural Biology

View full text Add to dashboard Cite

Fast and accurate large multiple sequence alignments using root-to-leave regressive computation

Garriga

Tommaso

Magis

et al. 2018

Preprint

View full text Add to dashboard Cite

15Inferences derived from large multiple alignments of biological sequences are critical to many areas of biology, including evolution, genomics, biochemistry, and structural biology. However, the complexity of the alignment problem imposes the use of approximate solutions. The most common is the progressive algorithm, which starts by aligning the most similar sequences, incorporating the remaining ones following the order imposed by a guide-tree. We developed and 20

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Edgar Garriga

Large multiple sequence alignments with a root-to-leaf regressive method

Multiple Sequence Alignment Computation Using the T-Coffee Regressive Algorithm Implementation

Towards the accurate alignment of over a million protein sequences: Current state of the art

Fast and accurate large multiple sequence alignments using root-to-leave regressive computation

Contact Info

Product

Resources

About