Background: APOBEC3 (A3) proteins deaminate DNA cytosines and block the replication of retroviruses and retrotransposons. Each A3 gene encodes a protein with one or two conserved zinccoordinating motifs (Z1, Z2 or Z3). The presence of one A3 gene in mice (Z2-Z3) and seven in humans, A3A-H (Z1a, Z2a-Z1b, Z2b, Z2c-Z2d, Z2e-Z2f, Z2g-Z1c, Z3), suggests extraordinary evolutionary flexibility. To gain insights into the mechanism and timing of A3 gene expansion and into the functional modularity of these genes, we analyzed the genomic sequences, expressed cDNAs and activities of the full A3 repertoire of three artiodactyl lineages: sheep, cattle and pigs.
Abstract. Reconciliation between a set of gene trees and a species tree is the most commonly used approach to infer the duplication and loss events in the evolution of gene families, given a species tree. When a species tree is not known, a natural algorithmic problem is to infer a species tree such that the corresponding reconciliation minimizes the number of duplications and/or losses. In this paper, we clarify several theoretical questions and study various algorithmic issues related to these two problems.(1) For a given gene tree T and species tree S, we show that there is a single history explaining T and consistent with S that minimizes gene losses, and that this history also minimizes the number of duplications. We describe a simple linear-time and space algorithm to compute this parsimonious history, that is not based on the Lowest Common Ancestor (LCA) mapping approach; (2) We show that the problem of computing a species tree that minimizes the number of gene duplications, given a set of gene trees, is in fact a slight variant of a supertree problem; (3) We show that deciding if a set of gene trees can be explained using only apparent duplications can be done efficiently, as well as computing a parsimonious species tree for such gene trees. We also characterize gene trees that can be explained using only apparent duplications in terms of compatible triplets of leaves.
The genome can be modeled as a set of strings (chromosomes) of distinguished elements called genes. Genome duplication is an important source of new gene functions and novel physiological pathways. Originally (ancestrally), a duplicated genome contains two identical copies of each chromosome, but through the genomic rearrangement mutational processes of reciprocal translocation (prefix and/or suffix exchanges between chromosomes) and substring reversals, this simple doubled structure is disrupted. At the time of observation, each of the chromosomes resulting from the accumulation of rearrangements can be decomposed into a succession of conserved segments, such that each segment appears exactly twice in the genome. We present exact algorithms for reconstructing the ancestral doubled genome in linear time, minimizing the number of rearrangement mutations required to derive the observed order of genes along the present-day chromosomes. Somewhat different techniques are required for a translocations-only model, a translocations/reversals model, both of these in the multichromosomal context (eukaryotic nuclear genomes), and a reversals-only model for single chromosome prokaryotic and organellar genomes. We apply these methods to the yeast genome, which is thought to have doubled, and to the liverwort mitochondrial genome, whose duplicate genes are unlikely to have arisen by genome doubling.
BackgroundA variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family G. But is a given set C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for G? While previous studies have focused on full sets of constraints, here we consider the general case where C does not necessarily involve a constraint for each pair of genes. The problem is subdivided in two parts: (1) Is C satisfiable, i.e. can we find an event-labeled gene tree G inducing C? (2) Is there such a G which is consistent, i.e., such that all displayed triplet phylogenies are included in a species tree?ResultsPrevious results on the Graph sandwich problem can be used to answer to (1), and we provide polynomial-time algorithms for satisfiability and consistency with a given species tree. We also describe a new polynomial-time algorithm for the case of consistency with an unknown species tree and full knowledge of pairwise orthology/paralogy relationships, as well as a branch-and-bound algorithm in the case when unknown relations are present. We show that our algorithms can be used in combination with ProteinOrtho, a sequence similarity-based orthology detection tool, to extract a set of robust orthology/paralogy relationships.
We consider two algorithmic questions related to the evolution of gene families. First, given a gene tree for a gene family, can the evolutionary history of this family be explained with only speciation and duplication events? Such gene trees are called DS-trees. We show that this question can be answered in linear time, and that a DS-tree induces a single species tree. We then study a natural extension of this problem: what is the minimum number of gene losses involved in an evolutionary history leading to an observed gene tree or set of gene trees? Based on our characterization of DS-trees, we propose a heuristic for this problem, and evaluate it on a dataset of plants gene families and on simulated data.
We find a large excess of short inversions, especially those involving a single gene, in comparison with a random inversion model. This is demonstrated through comparison of four pairs of bacterial genomes, using a specially-designed implementation of the Hannenhalli-Pevzner theory, and validated through experimentation on pairs of random genomes matched to the real pairs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.