A central challenge in the analysis of genetic variation is to provide realistic genome simulation across millions of samples. Present day coalescent simulations do not scale well, or use approximations that fail to capture important long-range linkage properties. Analysing the results of simulations also presents a substantial challenge, as current methods to store genealogies consume a great deal of space, are slow to parse and do not take advantage of shared structure in correlated trees. We solve these problems by introducing sparse trees and coalescence records as the key units of genealogical analysis. Using these tools, exact simulation of the coalescent with recombination for chromosome-sized regions over hundreds of thousands of samples is possible, and substantially faster than present-day approximate methods. We can also analyse the results orders of magnitude more quickly than with existing methods.
Our focus here is on the infinitesimal model. In this model, one or several quantitative traits are described as the sum of a genetic and a non-genetic component, the first being distributed within families as a normal random variable centred at the average of the parental genetic components, and with a variance independent of the parental traits. Thus, the variance that segregates within families is not perturbed by selection, and can be predicted from the variance components. This does not necessarily imply that the trait distribution across the whole population should be Gaussian, and indeed selection or population structure may have a substantial effect on the overall trait distribution. One of our main aims is to identify some general conditions on the allelic effects for the infinitesimal model to be accurate. We first review the long history of the infinitesimal model in quantitative genetics. Then we formulate the model at the phenotypic level in terms of individual trait values and relationships between individuals, but including different evolutionary processes: genetic drift, recombination, selection, mutation, population structure, …. We give a range of examples of its application to evolutionary questions related to stabilising selection, assortative mating, effective population size and response to selection, habitat preference and speciation. We provide a mathematical justification of the model as the limit as the number M of underlying loci tends to infinity of a model with Mendelian inheritance, mutation and environmental noise, when the genetic component of the trait is purely additive. We also show how the model generalises to include epistatic effects. We prove in particular that, within each family, the genetic components of the individual trait values in the current generation are indeed normally distributed with a variance independent of ancestral traits, up to an error of order 1∕M. Simulations suggest that in some cases the convergence may be as fast as 1∕M.
Preface vii Chapter 1. Superprocesses as Diffusion Approximations Summary The Dawson-Watanabe superprocess 1.1. Branching Brownian motion 1 1.2. A martingale characterisation 1.3. The Feller diffusion 1.4. Rescaling and tightness 1.5. The Dawson-Watanabe martingale problem 1.6. The method of duality 1.7. A more general class of superprocesses 1.8. Infinite initial measures 1.9. Historical superprocesses 24 The Fleming-Viot superprocess 1.10. The stepwise mutation model 1.11. The Fleming-Viot martingale problem 1.12. A dual process for Fleming-Viot 29 Chapter 2. Qualitative Behaviour I Summary The Dawson-Watanabe superprocess via its dual 2.1. A series solution to the evolution equation 33 2.2. Moments of the Dawson-Watanabe superprocess 37 2.3. The density in one dimension 2.4. The spde viewpoint 2.5. Occupation times 44 2.6. Continuity and discontinuity 47 2.7. The extinction/persistence dichotomy The Fleming-Viot superprocess: first properties 2.8. Moments 2.9. The density in one dimension Chapter 3. The Le Gall Representation Summary 3.1. The branching process in random walk 3.2. The Feller rescaling (again) 59 3.3. The Evans Immortal Particle 3.4. Other skeletons 3.5. Le Jan's construction iii iv CONTENTS 3.6. The Brownian Snake 67 3.7. The infinite variance snake 3.8. Superprocesses and subordination 72 Chapter 4. The Relationship Between Our Two classes of Superprocess 77 Summary 77 4.1. Approximating particle systems revisited 77 4.2. Generators revisited 78 4.3. The generator in polar coordinates 80 4.4. Consequences of the polar form of the generator Chapter 5. A Countable Representation 85 Summary 85 5.1. A second look at the moment equations for Fleming-Viot 85 5.2. The lookdown process 86 5.3. The modified Donnelly-Kurtz construction 90 5.4. Incorporating selection 95 5.5. Some old results revisited Chapter 6. Qualitative Behaviour II Summary 6.1. Cluster representations 6.2. The historical modulus of continuity 6.3. The Hausdorff measure of the support 6.4. Palm distributions for the Dawson-Watanabe superprocess 6.5. Charging and hitting sets 6.6. Intersection and collision local times Chapter 7.
We investigate a new model for populations evolving in a spatial continuum. This model can be thought of as a spatial version of the Λ-Fleming-Viot process. It explicitly incorporates both small scale reproduction events and large scale extinction-recolonisation events. The lineages ancestral to a sample from a population evolving according to this model can be described in terms of a spatial version of the Λ-coalescent. Using a technique of Evans (1997), we prove existence and uniqueness in law for the model. We then investigate the asymptotic behaviour of the genealogy of a finite number of individuals sampled uniformly at random (or more generally 'far enough apart') from a two-dimensional torus of sidelength L as L → ∞. Under appropriate conditions (and on a suitable timescale) we can obtain as limiting genealogical processes a Kingman coalescent, a more general Λcoalescent or a system of coalescing Brownian motions (with a non-local coalescence mechanism).
For a genetic locus carrying a strongly beneficial allele which has just fixed in a large population, we study the ancestry at a linked neutral locus. During this ``selective sweep'' the linkage between the two loci is broken up by recombination and the ancestry at the neutral locus is modeled by a structured coalescent in a random background. For large selection coefficients $\alpha$ and under an appropriate scaling of the recombination rate, we derive a sampling formula with an order of accuracy of $\mathcal{O}((\log \alpha)^{-2})$ in probability. In particular we see that, with this order of accuracy, in a sample of fixed size there are at most two nonsingleton families of individuals which are identical by descent at the neutral locus from the beginning of the sweep. This refines a formula going back to the work of Maynard Smith and Haigh, and complements recent work of Schweinsberg and Durrett on selective sweeps in the Moran model.Comment: Published at http://dx.doi.org/10.1214/105051606000000114 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org
What determines the genetic contribution that an individual makes to future generations? With biparental reproduction, each individual leaves a "pedigree" of descendants, determined by the biparental relationships in the population. The pedigree of an individual constrains the lines of descent of each of its genes. An individual's reproductive value is the expected number of copies of each of its genes that is passed on to distant generations conditional on its pedigree. For the simplest model of biparental reproduction (analogous to the Wright-Fisher model), an individual's reproductive value is determined within $10 generations, independent of population size. Partial selfing and subdivision do not greatly slow this convergence. Our central result is that the probability that a gene will survive is proportional to the reproductive value of the individual that carries it and that, conditional on survival, after a few tens of generations, the distribution of the number of surviving copies is the same for all individuals, whatever their reproductive value. These results can be generalized to the joint distribution of surviving blocks of the ancestral genome. Selection on unlinked loci in the genetic background may greatly increase the variance in reproductive value, but the above results nevertheless still hold. The almost linear relationship between survival probability and reproductive value also holds for weakly favored alleles. Thus, the influence of the complex pedigree of descendants on an individual's genetic contribution to the population can be summarized through a single number: its reproductive value.T HE most obvious feature of sexual reproduction is that each individual has two parents. Yet, the pedigrees that describe biparental relationships have received surprisingly little attention, compared with the genealogies that describe the uniparental relationships of genes. (Throughout, we refer to relationships between genes as their "genealogy", in contrast to the "pedigree" of biparental relationships; genealogy should be understood as a shorthand for "gene genealogy".) Following the rediscovery of Mendelian genetics, attention focused on the random genetic drift of discrete alleles and on the converse process of inbreeding, by which genes become identical by descent. There has of course been substantial work on the fate of genes within a given pedigree (e.g., Smith 1976, Cannings et al. 1978Thompson et al. 1978), but relatively little on the pedigrees themselves.Pedigrees are of interest in their own right: it is natural to ask who our ancestors were (Chang 1999;Rohde et al. 2004) and, conversely, how many descendants we will each leave. But, from a genetic point of view, the pedigree constrains what genes can be passed on: with Mendelian inheritance, selection acts solely through the different contributions made by individuals to the pedigree. The recent availability of genomic sequences may focus more attention on pedigrees: given sufficient sequence, we can infer the pedigree many generat...
We determine that the continuous-state branching processes for which the genealogy, suitably time-changed, can be described by an autonomous Markov process are precisely those arising from α-stable branching mechanisms. The random ancestral partition is then a time-changed Λcoalescent, where Λ is the Beta-distribution with parameters 2 − α and α, and the time change is given by Z 1−α , where Z is the total population size. For α = 2 (Feller's branching diffusion) and Λ = δ 0 (Kingman's coalescent), this is in the spirit of (a non-spatial version of) Perkins' Disintegration Theorem. For α = 1 and Λ the uniform distribution on [0, 1], this is the duality discovered by Bertoin & Le Gall (2000) between the norming of Neveu's continuous state branching process and the Bolthausen-Sznitman coalescent.We present two approaches: one, exploiting the 'modified lookdown construction', draws heavily on Donnelly & Kurtz (1999); the other is based on direct calculations with generators.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.