Why to account for finite sites in population genetic studies and how to do this with Jaatha 2.0

Mathew, Lisha; Staab, Paul; Rose, Laura; Metzler, Dirk

doi:10.1002/ece3.722

Cited by 17 publications

(17 citation statements)

References 79 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In particular, two robust predictions concerning the demography emerge from our results. First, we estimate a surprisingly ancient split time between northern and southern Sweden, about 153 kya (124–182 kya), which is older than a previous estimate of approximately 14 kya close to the beginning of our current warm period ( François et al 2008 ), but more similar to estimates for the split time among Spanish and Italian A. thaliana of 83 kya ( Mathew et al 2013 ). This old split time does not depend on data preprocessing (such as subsampling) and it is not specific to our best-fitting model, but robust across various model topologies.…”

Section: Discussionsupporting

confidence: 46%

See 1 more Smart Citation

Keeping It Local: Evidence for Positive Selection in Swedish Arabidopsis thaliana

Huber

Nordborg

Hermisson

et al. 2014

Molecular Biology and Evolution

View full text Add to dashboard Cite

Detecting positive selection in species with heterogeneous habitats and complex demography is notoriously difficult and prone to statistical biases. The model plant Arabidopsis thaliana exemplifies this problem: In spite of the large amounts of data, little evidence for classic selective sweeps has been found. Moreover, many aspects of the demography are unclear, which makes it hard to judge whether the few signals are indeed signs of selection, or false positives caused by demographic events. Here, we focus on Swedish A. thaliana and we find that the demography can be approximated as a two-population model. Careful analysis of the data shows that such a two island model is characterized by a very old split time that significantly predates the last glacial maximum followed by secondary contact with strong migration. We evaluate selection based on this demography and find that this secondary contact model strongly affects the power to detect sweeps. Moreover, it affects the power differently for northern Sweden (more false positives) as compared with southern Sweden (more false negatives). However, even when the demographic history is accounted for, sweep signals in northern Sweden are stronger than in southern Sweden, with little or no positional overlap. Further simulations including the complex demography and selection confirm that this is not compatible with global selection acting on both populations, and thus can be taken as evidence for local selection within subpopulations of Swedish A. thaliana. This study demonstrates the necessity of combining demographic analyses and sweep scans for the detection of selection, particularly when selection acts predominantly local.

show abstract

Section: Discussionsupporting

confidence: 46%

“…Furthermore, the branch to A. lyrata is long enough, so that recurrent mutations are nonnegligible. This in turn might lead to biases when estimating population genetic parameters ( Mathew et al 2013 ). To avoid this, we based our inference on the folded jSFS.…”

Section: Methodsmentioning

confidence: 99%

Keeping It Local: Evidence for Positive Selection in Swedish Arabidopsis thaliana

Huber

Nordborg

Hermisson

et al. 2014

Molecular Biology and Evolution

View full text Add to dashboard Cite

show abstract

“…If it is common, neglecting recurrent mutation can bias inferences of mutation rate, population size, and selection (Desai and Plotkin 2008;Mathew et al 2013). Applying our present theory thus requires that the mutation rate be high enough to create a substantial number of triallelic sites for inference, but not so high that a large fraction of biallelic or triallelic sites are affected by recurrent mutation.…”

Section: Correlated Fitness At Triallelic Sites 515mentioning

confidence: 91%

Triallelic Population Genomics for Inferring Correlated Fitness Effects of Same Site Nonsynonymous Mutations

et al. 2016

View full text Add to dashboard Cite

The distribution of mutational effects on fitness is central to evolutionary genetics. Typical univariate distributions, however, cannot model the effects of multiple mutations at the same site, so we introduce a model in which mutations at the same site have correlated fitness effects. To infer the strength of that correlation, we developed a diffusion approximation to the triallelic frequency spectrum, which we applied to data from Drosophila melanogaster. We found a moderate positive correlation between the fitness effects of nonsynonymous mutations at the same codon, suggesting that both mutation identity and location are important for determining fitness effects in proteins. We validated our approach by comparing it to biochemical mutational scanning experiments, finding strong quantitative agreement, even between different organisms. We also found that the correlation of mutational fitness effects was not affected by protein solvent exposure or structural disorder. Together, our results suggest that the correlation of fitness effects at the same site is a previously overlooked yet fundamental property of protein evolution.KEYWORDS diffusion approximation; distribution of fitness effects; Drosophila melanogaster; nonsynonymous mutations; triallelic sites M UTATIONS create genetic variation within populations, some of which causes differential fitness among individuals upon which natural selection operates. The effects of mutations on fitness range from strongly deleterious to strongly beneficial, and the distribution of fitness effects (DFE) is key for many problems in genetics, from the evolution of sex (Barton and Charlesworth 1998) to the architecture of human disease (Di Rienzo 2006). For protein-coding regions, there are generally many strongly deleterious or lethal mutations, a similar number of moderately deleterious or nearly neutral mutations, and a small number of beneficial mutations . The DFE may be determined experimentally through direct measurements of mutation fitness effects in clonal populations of viruses, bacteria, or yeast (Wloch et al. 2001;Sanjuán et al. 2004), and recent studies have provided high-resolution DFEs for single genes (Bank et al. 2014; and for beneficial mutations (Levy et al. 2015). The DFE may also be inferred from comparative (Nielsen and Yang 2003;Tamuri et al. 2012) or population genetic (Williamson et al. 2005;Eyre-Walker et al. 2006; Keightley and EyreWalker 2007;Boyko et al. 2008) data, although these approaches have little power for strongly deleterious mutations.In the typical population genetic approach for estimating the DFE, the population demography is first inferred using a putatively neutral class of mutations, and the DFE for another class of mutations is inferred by modeling the distribution of allele frequencies expected under a model of demography plus selection. Most population genetic inference has focused on biallelic loci, for which the ancestral allele and a single mutant (derived) allele are segregating in the population. When many indi...

show abstract

“…In our simulation study we consider the JSFS J of two populations P 1 and P 2 (e.g., J [x, y] = z means that there are z positions in our aligned data that are found in x samples of in P 1 and in y samples in P 2 ). As already extensively tested in Tellier et al (2011) , we further coarsen the JSFS into Jaatha’s default set of summary statistics (SS) based upon frequency classes (for a description see Naduvilezhath et al, 2011 ; Tellier et al, 2011 ; Mathew et al, 2013 ; and an example in Supplementary Figure S1 ), which has been shown to perform well when estimating neutral demographic models. These SS divide the high and low frequency variants into single frequency classes and the middle frequency variants into fewer classes, resulting in 23 frequency classes in total.…”

Section: Introductionmentioning

confidence: 99%

“…Although it has been well-described that selection and demography may result in similar patterns of diversity, the ability to jointly estimate these two processes has remained elusive. Here, we use simulation to explore the utility of the joint site frequency spectrum to estimate selection and demography simultaneously, including developing an extension of the previously proposed Jaatha program ( Mathew et al, 2013 ). We evaluate both complete and incomplete selective sweeps under an isolation-with-migration model with and without population size change (both population growth and bottlenecks).…”

mentioning

confidence: 99%

Evaluating the ability of the pairwise joint site frequency spectrum to co-estimate selection and demography

Mathew

Jensen

2015

Front. Genet.

Self Cite

View full text Add to dashboard Cite

The ability to infer the parameters of positive selection from genomic data has many important implications, from identifying drug-resistance mutations in viruses to increasing crop yield by genetically integrating favorable alleles. Although it has been well-described that selection and demography may result in similar patterns of diversity, the ability to jointly estimate these two processes has remained elusive. Here, we use simulation to explore the utility of the joint site frequency spectrum to estimate selection and demography simultaneously, including developing an extension of the previously proposed Jaatha program (Mathew et al., 2013). We evaluate both complete and incomplete selective sweeps under an isolation-with-migration model with and without population size change (both population growth and bottlenecks). Results suggest that while it may not be possible to precisely estimate the strength of selection, it is possible to infer the presence of selection while estimating accurate demographic parameters. We further demonstrate that the common assumption of selective neutrality when estimating demographic models may lead to severe biases. Finally, we apply the approach we have developed to better characterize the within-host demographic and selective history of human cytomegalovirus (HCMV) infection using published next generation sequencing data.

show abstract

Why to account for finite sites in population genetic studies and how to do this with Jaatha 2.0

Cited by 17 publications

References 79 publications

Keeping It Local: Evidence for Positive Selection in Swedish Arabidopsis thaliana

Keeping It Local: Evidence for Positive Selection in Swedish Arabidopsis thaliana

Triallelic Population Genomics for Inferring Correlated Fitness Effects of Same Site Nonsynonymous Mutations

Evaluating the ability of the pairwise joint site frequency spectrum to co-estimate selection and demography

Contact Info

Product

Resources

About