Monophyletic groups-groups that consist of all of the descendants of a most recent common ancestor-arise naturally as a consequence of descent processes that result in meaningful distinctions between organisms. Aspects of monophyly are therefore central to fields that examine and use genealogical descent. In particular, studies in conservation genetics, phylogeography, population genetics, species delimitation, and systematics can all make use of mathematical predictions under evolutionary models about features of monophyly. One important calculation, the probability that a set of gene lineages is monophyletic under a two-species neutral coalescent model, has been used in many studies. Here, we extend this calculation for a species tree model that contains arbitrarily many species. We study the effects of species tree topology and branch lengths on the monophyly probability. These analyses reveal new behavior, including the maintenance of nontrivial monophyly probabilities for gene lineage samples that span multiple species and even for lineages that do not derive from a monophyletic species group. We illustrate the mathematical results using an example application to data from maize and teosinte. Since early in the development of coalescent theory and phylogeography, coalescent formulas and related simulations have contributed to a probabilistic understanding of the shapes of multispecies gene trees (1-3), enabling novel predictions about gene tree shapes under evolutionary hypotheses (4, 5), new ways of testing hypotheses about gene tree discordances (6, 7), and new algorithms for problems of species tree inference (8, 9) and species delimitation (10, 11). A "multispecies coalescent" model, in which coalescent processes on separate species tree branches merge back in time as species reach a common ancestor (12), has become a key tool for theoretical predictions, simulation design, and evaluation of inference methods, and as a null model for data analysis.A fundamental concept in genealogical studies is that of monophyly. In a genealogy, a group that is monophyletic consists of all of the descendants of its most recent common ancestor (MRCA): every lineage in the group-and no lineage outside it-descends from this ancestor. Backward in time, a monophyletic group has all of its lineages coalesce with each other before any coalesces with a lineage from outside the group.The phylogenetic and phylogeographic importance of monophyly traces to the fact that monophyly enables a natural definition of a genealogical unit. Such a unit can describe a distinctive set of organisms that differs from other groups of organisms in ways that are evolutionarily meaningful. Species can be delimited by characters present in every member of a species and absent outside the species, and that therefore can reflect monophyly (13,14). In conservation biology, monophyly can be used as a prioritization criterion because groups with many monophyletic loci are likely to possess unique evolutionary features (15). Reciprocal monophyly, in whic...
Bacterial genomes exhibit widespread horizontal gene transfer, resulting in highly variable genome content that complicates the inference of genetic interactions. In this study, we develop a method for detecting coevolving genes from large datasets of bacterial genomes based on pairwise comparisons of closely related individuals, analogous to a pedigree study in eukaryotic populations. We apply our method to pairs of genes from the Staphylococcus aureus accessory genome of over 75,000 annotated gene families using a database of over 40,000 whole genomes. We find many pairs of genes that appear to be gained or lost in a coordinated manner, as well as pairs where the gain of one gene is associated with the loss of the other. These pairs form networks of rapidly coevolving genes, primarily consisting of genes involved in virulence, mechanisms of horizontal gene transfer, and antibiotic resistance, particularly the SCCmec complex. While we focus on gene gain and loss, our method can also detect genes that tend to acquire substitutions in tandem, or genotype-phenotype or phenotype-phenotype coevolution. Finally, we present the R package that allows for the computation of our method.
F ST is a statistic that is frequently used to analyze population structure. Recent work has shown that FST depends strongly on the underlying genetic diversity of a locus from which it is computed... The population-genetic statistic FST is used widely to describe allele frequency distributions in subdivided populations. The increasing availability of DNA sequence data has recently enabled computations of FST from sequence-based “haplotype loci.” At the same time, theoretical work has revealed that FST has a strong dependence on the underlying genetic diversity of a locus from which it is computed, with high diversity constraining values of FST to be low. In the case of haplotype loci, for which two haplotypes that are distinct over a specified length along a chromosome are treated as distinct alleles, genetic diversity is influenced by haplotype length: longer haplotype loci have the potential for greater genetic diversity. Here, we study the dependence of FST on haplotype length. Using a model in which a haplotype locus is sequentially incremented by one biallelic locus at a time, we show that increasing the length of the haplotype locus can either increase or decrease the value of FST, and usually decreases it. We compute FST on haplotype loci in human populations, finding a close correspondence between the observed values and our theoretical predictions. We conclude that effects of haplotype length are valuable to consider when interpreting FST calculated on haplotypic data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.