Accounting for long-range correlations in genome-wide simulations of large cohorts

Nelson, Dominic; Kelleher, Jerome; Ragsdale, Aaron P.; Moreau, Claudia; McVean, Gil; Gravel, Simon

doi:10.1371/journal.pgen.1008619

Cited by 52 publications

(64 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…First, while coalescent simulations allow for decreased computational burden, model assumptions may result in inaccurate long-range LD, especially for whole-genome simulations. 30 However, given we only simulated chromosome 20, biases are expected to be modest. 30 We also use a case-control framework for our simulation; therefore, power and potential differences in population PRS accuracy may be even higher if a quantitative trait was used.…”

Section: Discussionmentioning

confidence: 99%

“… 30 However, given we only simulated chromosome 20, biases are expected to be modest. 30 We also use a case-control framework for our simulation; therefore, power and potential differences in population PRS accuracy may be even higher if a quantitative trait was used. In addition, our simulations assume random mating among admixed individuals and therefore do not reflect the more complex assortative mating that may be observed, which may impact the distribution of local ancestry tract lengths in our simulation and therefore hinder the improvement of PRS accuracy by local ancestry weighting.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Inclusion of variants discovered from diverse populations improves polygenic risk score transferability

Cavazos

Witte

2021

Human Genetics and Genomics Advances

View full text Add to dashboard Cite

Summary The majority of polygenic risk scores (PRSs) have been developed and optimized in individuals of European ancestry and may have limited generalizability across other ancestral populations. Understanding aspects of PRSs that contribute to this issue and determining solutions is complicated by disease-specific genetic architecture and limited knowledge of sharing of causal variants and effect sizes across populations. Motivated by these challenges, we undertook a simulation study to assess the relationship between ancestry and the potential bias in PRSs developed in European ancestry populations. Our simulations show that the magnitude of this bias increases with increasing divergence from European ancestry, and this is attributed to population differences in linkage disequilibrium and allele frequencies of European-discovered variants, likely as a result of genetic drift. Importantly, we find that including into the PRS variants discovered in African ancestry individuals has the potential to achieve unbiased estimates of genetic risk across global populations and admixed individuals. We confirm our simulation findings in an analysis of hemoglobin A1c (HbA1c), asthma, and prostate cancer in the UK Biobank. Given the demonstrated improvement in PRS prediction accuracy, recruiting larger diverse cohorts will be crucial—and potentially even necessary—for enabling accurate and equitable genetic risk prediction across populations.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Inclusion of variants discovered from diverse populations improves polygenic risk score transferability

Cavazos

Witte

2021

Human Genetics and Genomics Advances

View full text Add to dashboard Cite

show abstract

“…After running Refined IBD, we used its gap-filling utility to remove gaps between segments less than 0.6 cM that had at most one discordant homozygote. We then filtered out IBD segments smaller than 5 cM, as short segments below this threshold are difficult to accurately detect 59 . To visualize IBD sharing in We called ROH using GARLIC v.1.1.6 61 , which implements the ROH calling pipeline of Pemberton et al 201229 .…”

Section: Identification Of Ibd Tracts and Rohmentioning

confidence: 99%

Sex-biased admixture and assortative mating shape genetic variation and influence demographic inference in admixed Cabo Verdeans

Korunes

Souza

Bobrek³

et al. 2020

Preprint

View full text Add to dashboard Cite

Genetic data can provide insights into population history, but first we must understand the patterns that complex histories leave in genomes. Here, we consider the admixed human population of Cabo Verde to understand the patterns of genetic variation left by social and demographic processes. First settled in the late 1400s, Cabo Verdeans are admixed descendants of Portuguese colonizers and enslaved West African people. We consider Cabo Verde′s well-studied historical record alongside genome-wide SNP data from 563 individuals from 4 regions within the archipelago. We use genetic ancestry to test for patterns of nonrandom mating and sex-specific gene flow, and we examine the consequences of these processes for common demographic inference methods and for genetic patterns. Notably, multiple population genetic tools that assume random mating underestimate the timing of admixture, but incorporating non-random mating produces estimates more consistent with historical records. We consider how admixture interrupts common summaries of genomic variation such as runs-of-homozygosity (ROH). While summaries of ROH may be difficult to interpret in admixed populations, differentiating ROH by length class shows that ROH reflect historical differences between the islands in their contributions from the source populations and post-admixture population dynamics. Finally, we find higher African ancestry on the X chromosome than on the autosomes, consistent with an excess of European males and African females contributing to the gene pool. Considering these genomic insights into population history in the context of Cabo Verde′s historical record, we can identify how assumptions in genetic models impact inference of population history more broadly.

show abstract

“…Rousset, 2000), but cannot be simulated when considering extensions of Kingman's (1982) n-coalescent which assume large sub-population sizes and small migration rates. To simulate small population sizes and high dispersal rates without biases (Nelson et al, 2020), coalescence probabilities exact for small population size (Fu, 2006) must be used in a gen-bygen simulation until the common ancestors of the whole simulated sample have been found. Such simulations are required to assess any inference framework which might for example allow separate estimation of sub-population size, mutation and migration probabilities, something that is not possible under n-coalescent approximations.…”

Section: Introductionmentioning

confidence: 99%

“…As gen-by-gen algorithms are expected to be slower than those involving n-coalescent approximations, we performed simulations to check the feasibility of simulating genomic data by such algorithms, and compared computation times with those of alternative software based on n-coalescent approximations, such as msprime (Kelleher et al, 2016), FastSimcoal2 (Excoffier et al, 2013), exact coalescence algorithms implemented as DTWF in msprime python package [back-in-time Wright-Fisher simulator, Nelson et al (2020)], IBDSim (Leblois et al, 2009) and forward algorithms, such as SimBit (Matthey-Doret, 2020).…”

Section: Introductionmentioning

confidence: 99%

GSpace: an exact coalescence simulator of recombining genomes under isolation by distance

et al. 2021

View full text Add to dashboard Cite

Motivation Simulation-based inference can bypass the limitations of statistical methods based on analytical approximations, but software allowing simulation of structured population genetic data without the classical n-coalescent approximations (such as those following from assuming large population size) are scarce or slow. Results We present GSpace, a simulator for genomic data, based on a generation-by-generation coalescence algorithm taking into account small population size, recombination, and isolation by distance. Availability Freely available at site web INRAe (http://www1.montpellier.inra.fr/CBGP/software/gspace/download.html)

show abstract

Accounting for long-range correlations in genome-wide simulations of large cohorts

Cited by 52 publications

References 28 publications

Inclusion of variants discovered from diverse populations improves polygenic risk score transferability

Inclusion of variants discovered from diverse populations improves polygenic risk score transferability

Sex-biased admixture and assortative mating shape genetic variation and influence demographic inference in admixed Cabo Verdeans

GSpace: an exact coalescence simulator of recombining genomes under isolation by distance

Contact Info

Product

Resources

About