BackgroundTandemly Arrayed Gene (TAG) clusters are groups of paralogous genes that are found adjacent on a chromosome. TAGs represent an important repertoire of genes in eukaryotes. In addition to tandem duplication events, TAG clusters are affected during their evolution by other mechanisms, such as inversion and deletion events, that affect the order and orientation of genes. The DILTAG algorithm developed in [1] makes it possible to infer a set of optimal evolutionary histories explaining the evolution of a single TAG cluster, from an ancestral single gene, through tandem duplications (simple or multiple, direct or inverted), deletions and inversion events.ResultsWe present a general methodology, which is an extension of DILTAG, for the study of the evolutionary history of a set of orthologous TAG clusters in multiple species. In addition to the speciation events reflected by the phylogenetic tree of the considered species, the evolutionary events that are taken into account are simple or multiple tandem duplications, direct or inverted, simple or multiple deletions, and inversions. We analysed the performance of our algorithm on simulated data sets and we applied it to the protocadherin gene clusters of human, chimpanzee, mouse and rat.ConclusionsOur results obtained on simulated data sets showed a good performance in inferring the total number and size distribution of duplication events. A limitation of the algorithm is however in dealing with multiple gene deletions, as the algorithm is highly exponential in this case, and becomes quickly intractable.
Abstract. Given a phylogenetic tree involving Whole Genome Duplication events, we contribute to solving the problem of computing the rearrangement and DCJ distances on a branch of the tree linking a duplication node d to a speciation node or a leaf s. In the case of a genome G at s containing exactly two copies of each gene, the genome halving problem is to find a perfectly duplicated genome D at d minimizing the rearrangement distance with G. We generalize the existing exact linear-time algorithm for genome halving to the case of a genome G with missing gene copies. In the case of a known ancestral duplicated genome D, we develop a greedy approach for computing the distance between G and D, called the double distance. Two algorithms are developed in both cases of a genome G containing exactly two copies of each gene, or at most two copies of each gene (with missing gene copies). These algorithms are shown time-efficient and very accurate for both the rearrangement and DCJ distances.
Abstract. Given a phylogenetic tree involving Whole Genome Duplication events, we contribute to the problem of computing the rearrangement distance on a branch of a tree linking a duplication node d to a speciation node or a leaf s. In the case of a genome G at s containing exactly two copies of each gene, the genome halving problem is to find a perfectly duplicated genome D at d minimizing the rearrangement distance with G. We generalize the existing exact linear-time algorithm for genome halving to the case of a genome G with missing gene copies. In the case of a known ancestral duplicated genome D, we develop a greedy approach for computing the distance between G and D that is shown time-efficient and very accurate for both the rearrangement and DCJ distances.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.