Abstract:Models of evolution by genome rearrangements are prone to two types of flaws: One is to ignore the diversity of susceptibility to breakage across genomic regions, and the other is to suppose that susceptibility values are given. Without necessarily supposing their precise localization, we call “solid” the regions that are improbably broken by rearrangements and “fragile” the regions outside solid ones. We propose a model of evolution by inversions where breakage probabilities vary across fragile regions and ov… Show more
“…Simulations with Zombi are fast: with a starting genome of 500 genes and a species tree of 2000 taxa (extinct + extant), it takes around 1 minute on a 3.4Ghz laptop to simulate all the genomes ( Figure S6). We validated that the distribution of waiting times between successive events was following an exponential distribution ( Figure S7 and S8), that the distribution of intergene sizes at equilibrium was following a flat Dirichlet distribution, as expected from Biller et al 2016 ( Figure S9), that the number of events and their extension occurs with a frequency according to their respective rates ( Figure S10) and that the gene family size distribution followed a power-law when duplication rates are higher than loss rates and stretched-exponential in the opposite case ( Figure S11). We also checked by hand the validity of many simple scenarios to detect possible inconsistencies in the algorithm.…”
Section: Performance and Validationsupporting
confidence: 72%
“…For example, it is possible to use a species tree input by the user, to generate species trees with variable extinction and speciation rates, or to control the number of living lineages at each unit of time ( Figure S5). At the genome level, Zombi can simulate genomes using branch-specific rates (Gu mode, allowing the user to simulate very specific scenarios such as one in which a certain lineage experiences a massive loss of genes), gene-family specific rates (Gm mode, which makes easier the process of using rates estimated from real datasets) and genomes accounting for intergenic regions (Gf mode) of variable length (drawn from a flat Dirichlet distribution (Biller et al 2016) . At the sequence level, finally, the user can fine-tune the substitution rates to make them branch specific.…”
Most living organisms that ever existed on Earth have left no descendants. Because introgressions and lateral gene transfers are frequent, some of these extinct lineages have impacted the evolution of extant species and their ancestors. As a consequence, ignoring extinct lineages in evolutionary studies can lead to spurious conclusions. Here we present Zombi, a platform to simulate the evolution of species, genes and genomes taking extinct lineages into account. We demonstrate its utility by testing a statistical inference method used to detect introgression and show that ignoring the presence of extinct lineages yields inconsistent results.
“…Simulations with Zombi are fast: with a starting genome of 500 genes and a species tree of 2000 taxa (extinct + extant), it takes around 1 minute on a 3.4Ghz laptop to simulate all the genomes ( Figure S6). We validated that the distribution of waiting times between successive events was following an exponential distribution ( Figure S7 and S8), that the distribution of intergene sizes at equilibrium was following a flat Dirichlet distribution, as expected from Biller et al 2016 ( Figure S9), that the number of events and their extension occurs with a frequency according to their respective rates ( Figure S10) and that the gene family size distribution followed a power-law when duplication rates are higher than loss rates and stretched-exponential in the opposite case ( Figure S11). We also checked by hand the validity of many simple scenarios to detect possible inconsistencies in the algorithm.…”
Section: Performance and Validationsupporting
confidence: 72%
“…For example, it is possible to use a species tree input by the user, to generate species trees with variable extinction and speciation rates, or to control the number of living lineages at each unit of time ( Figure S5). At the genome level, Zombi can simulate genomes using branch-specific rates (Gu mode, allowing the user to simulate very specific scenarios such as one in which a certain lineage experiences a massive loss of genes), gene-family specific rates (Gm mode, which makes easier the process of using rates estimated from real datasets) and genomes accounting for intergenic regions (Gf mode) of variable length (drawn from a flat Dirichlet distribution (Biller et al 2016) . At the sequence level, finally, the user can fine-tune the substitution rates to make them branch specific.…”
Most living organisms that ever existed on Earth have left no descendants. Because introgressions and lateral gene transfers are frequent, some of these extinct lineages have impacted the evolution of extant species and their ancestors. As a consequence, ignoring extinct lineages in evolutionary studies can lead to spurious conclusions. Here we present Zombi, a platform to simulate the evolution of species, genes and genomes taking extinct lineages into account. We demonstrate its utility by testing a statistical inference method used to detect introgression and show that ignoring the presence of extinct lineages yields inconsistent results.
“…First, the definition of weighted genomes [1, 3] opens combinatorial questions, one of which being the transformation of a genome into another in a minimum number of steps. In a previous paper [3] we solved the strict version of this problem, where genomes were forced to have the same total intergene sizes and only wDCJs were allowed.…”
Section: Discussionmentioning
confidence: 99%
“…In a previous publication [1], we have argued that intergenic sizes were a crucial parameter to infer genome rearrangement distances. Indeed, ignoring this information, as all published distance estimations were doing so far [2], leads to strong biases in all estimations and validation procedures.…”
Section: Introductionmentioning
confidence: 99%
“…Indeed it is known that such a space is huge [4, 5], which makes it hard to analyze; several methods have thus been devised to add genomic or epigenomic constraints to restrict the search space [6–8]. So far, the potential of intergenic sizes has only been explored for distance computations [1, 3]. We show that it can also contain information on the scenarios, by characterizing categories of DCJs that can be used in optimal DCJs and indels scenarios.…”
BackgroundGiven two genomes that have diverged by a series of rearrangements, we infer minimum Double Cut-and-Join (DCJ) scenarios to explain their organization differences, coupled with indel scenarios to explain their intergene size distribution, where DCJs themselves also alter the sizes of broken intergenes.ResultsWe give a polynomial-time algorithm that, given two genomes with arbitrary intergene size distributions, outputs a DCJ scenario which optimizes on the number of DCJs, and given this optimal number of DCJs, optimizes on the total sum of the sizes of the indels.ConclusionsWe show that there is a valuable information in the intergene sizes concerning the rearrangement scenario itself. On simulated data we show that statistical properties of the inferred scenarios are closer to the true ones than DCJ only scenarios, i.e. scenarios which do not handle intergene sizes.
Comparative genomics considers the detection of similarities and differences between extant genomes, and, based on more or less formalized hypotheses regarding the involved evolutionary processes, inferring ancestral states explaining the similarities and an evolutionary history explaining the differences. In this chapter, we focus on the reconstruction of the organization of ancient genomes into chromosomes. We review different methodological approaches and software, applied to a wide range of datasets from different kingdoms of life and at different evolutionary depths. We discuss relations with genome assembly, and potential approaches to validate computational predictions on ancient genomes that are almost always only accessible through these predictions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.