2020
DOI: 10.1371/journal.pgen.1008619
|View full text |Cite
|
Sign up to set email alerts
|

Accounting for long-range correlations in genome-wide simulations of large cohorts

Abstract: Coalescent simulations are widely used to examine the effects of evolution and demographic history on the genetic makeup of populations. Thanks to recent progress in algorithms and data structures, simulators such as the widely-used msprime now provide genome-wide simulations for millions of individuals. However, this software relies on classic coalescent theory and its assumptions that sample sizes are small and that the region being simulated is short. Here we show that coalescent simulations of long regions… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
63
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 52 publications
(64 citation statements)
references
References 28 publications
(43 reference statements)
1
63
0
Order By: Relevance
“…First, while coalescent simulations allow for decreased computational burden, model assumptions may result in inaccurate long-range LD, especially for whole-genome simulations. 30 However, given we only simulated chromosome 20, biases are expected to be modest. 30 We also use a case-control framework for our simulation; therefore, power and potential differences in population PRS accuracy may be even higher if a quantitative trait was used.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…First, while coalescent simulations allow for decreased computational burden, model assumptions may result in inaccurate long-range LD, especially for whole-genome simulations. 30 However, given we only simulated chromosome 20, biases are expected to be modest. 30 We also use a case-control framework for our simulation; therefore, power and potential differences in population PRS accuracy may be even higher if a quantitative trait was used.…”
Section: Discussionmentioning
confidence: 99%
“… 30 However, given we only simulated chromosome 20, biases are expected to be modest. 30 We also use a case-control framework for our simulation; therefore, power and potential differences in population PRS accuracy may be even higher if a quantitative trait was used. In addition, our simulations assume random mating among admixed individuals and therefore do not reflect the more complex assortative mating that may be observed, which may impact the distribution of local ancestry tract lengths in our simulation and therefore hinder the improvement of PRS accuracy by local ancestry weighting.…”
Section: Discussionmentioning
confidence: 99%
“…After running Refined IBD, we used its gap-filling utility to remove gaps between segments less than 0.6 cM that had at most one discordant homozygote. We then filtered out IBD segments smaller than 5 cM, as short segments below this threshold are difficult to accurately detect 59 . To visualize IBD sharing in We called ROH using GARLIC v.1.1.6 61 , which implements the ROH calling pipeline of Pemberton et al 201229 .…”
Section: Identification Of Ibd Tracts and Rohmentioning
confidence: 99%
“…Rousset, 2000), but cannot be simulated when considering extensions of Kingman's (1982) n-coalescent which assume large sub-population sizes and small migration rates. To simulate small population sizes and high dispersal rates without biases (Nelson et al, 2020), coalescence probabilities exact for small population size (Fu, 2006) must be used in a gen-bygen simulation until the common ancestors of the whole simulated sample have been found. Such simulations are required to assess any inference framework which might for example allow separate estimation of sub-population size, mutation and migration probabilities, something that is not possible under n-coalescent approximations.…”
Section: Introductionmentioning
confidence: 99%
“…As gen-by-gen algorithms are expected to be slower than those involving n-coalescent approximations, we performed simulations to check the feasibility of simulating genomic data by such algorithms, and compared computation times with those of alternative software based on n-coalescent approximations, such as msprime (Kelleher et al, 2016), FastSimcoal2 (Excoffier et al, 2013), exact coalescence algorithms implemented as DTWF in msprime python package [back-in-time Wright-Fisher simulator, Nelson et al (2020)], IBDSim (Leblois et al, 2009) and forward algorithms, such as SimBit (Matthey-Doret, 2020).…”
Section: Introductionmentioning
confidence: 99%