Marie Forest scite author profile

Knowledge of genome-wide genealogies for thousands of individuals would simplify most evolutionary analyses for humans and other species, but has remained computationally infeasible. We developed a method, Relate, scaling to > 10,000 sequences while simultaneously estimating branch lengths, mutational ages, and variable historical population sizes, as well as allowing for data errors. Application to 1000 Genomes Project haplotypes produces joint genealogical histories for 26 human populations. Highly diverged lineages are present in all groups, but most frequent in Africa. Outside Africa, these mainly reflect ancient introgression from groups related to Neanderthals and Denisovans, while African signals instead reflect unknown events, unique to that continent. Our approach allows more powerful inferences of natural selection than previously possible. We identify multiple novel regions under strong positive selection, and multi-allelic traits including hair colour, BMI, and blood pressure, showing strong evidence of directional selection, varying among human groups.Large-scale genetic variation datasets are now available for a variety of species, including tens of thousands of humans. In principle, all information about a sample's genetic history is captured by their underlying genealogical history, which records the historical coalescence, recombination, and mutation events that produced the observed variation patterns. In practice, several key existing approaches (e.g., Refs. [1,2]) leverage an underlying coalescent model, because this provides a flexible modelling framework and is the limiting behaviour of a variety of finite-population models 3,4 . However, inference under the coalescent is complicated by the structure of the model, uncertainty over the correct genealogy conditional on observed data, and the large resulting space of possible sample histories 5 . Other approaches 6-11 use more heuristic approximations to the coalescent, sometimes reducing accuracy: regardless, all published existing methods scale to tens or a few hundred samples at most.As a result of these issues, the use of direct genealogy-based inference to detect recombination events, date mutations, and reveal evidence of positive selection has been limited to smaller datasets 1,2 , while for larger datasets approaches based on data summaries 12-14 or downsampling 15,16 have predominated. A diverse set of tools have detected genetic structure that is in good agreement with geopolitical separation over generations 17 .Admixtures of ancient populations have been identified and dated 18 . Other applications have found bottlenecks in population sizes that are consistent with anthropological evidence of initial human migration from the African continent 15,19-21 and evidence of subsequent introgression with archaic humans, such as Neanderthals 22 .We have developed a scalable method, Relate, to estimate genome-wide genealogies (see Figure 1; Methods;URLs for implementation). Relate separates two steps; firstly identifying a genealogical fra...

show abstract

Improved prediction of fracture risk leveraging a genome-wide polygenic risk score

Forgetta

Keller-Baruch

et al. 2021

Genome Med

View full text Add to dashboard Cite

Background Accurately quantifying the risk of osteoporotic fracture is important for directing appropriate clinical interventions. While skeletal measures such as heel quantitative speed of sound (SOS) and dual-energy X-ray absorptiometry bone mineral density are able to predict the risk of osteoporotic fracture, the utility of such measurements is subject to the availability of equipment and human resources. Using data from 341,449 individuals of white British ancestry, we previously developed a genome-wide polygenic risk score (PRS), called gSOS, that captured 25.0% of the total variance in SOS. Here, we test whether gSOS can improve fracture risk prediction. Methods We examined the predictive power of gSOS in five genome-wide genotyped cohorts, including 90,172 individuals of European ancestry and 25,034 individuals of Asian ancestry. We calculated gSOS for each individual and tested for the association between gSOS and incident major osteoporotic fracture and hip fracture. We tested whether adding gSOS to the risk prediction models had added value over models using other commonly used clinical risk factors. Results A standard deviation decrease in gSOS was associated with an increased odds of incident major osteoporotic fracture in populations of European ancestry, with odds ratios ranging from 1.35 to 1.46 in four cohorts. It was also associated with a 1.26-fold (95% confidence interval (CI) 1.13–1.41) increased odds of incident major osteoporotic fracture in the Asian population. We demonstrated that gSOS was more predictive of incident major osteoporotic fracture (area under the receiver operating characteristic curve (AUROC) = 0.734; 95% CI 0.727–0.740) and incident hip fracture (AUROC = 0.798; 95% CI 0.791–0.805) than most traditional clinical risk factors, including prior fracture, use of corticosteroids, rheumatoid arthritis, and smoking. We also showed that adding gSOS to the Fracture Risk Assessment Tool (FRAX) could refine the risk prediction with a positive net reclassification index ranging from 0.024 to 0.072. Conclusions We generated and validated a PRS for SOS which was associated with the risk of fracture. This score was more strongly associated with the risk of fracture than many clinical risk factors and provided an improvement in risk prediction. gSOS should be explored as a tool to improve risk stratification to identify individuals at high risk of fracture.

show abstract

Development of a polygenic risk score to improve screening for fracture risk: A genetic risk prediction study

et al. 2020

View full text Add to dashboard Cite

show abstract

Gene networks show associations with seed region connectivity

Forest

Iturria-Medina

Goldman

et al. 2017

Human Brain Mapping

View full text Add to dashboard Cite

Primary patterns in adult brain connectivity are established during development by coordinated networks of transiently expressed genes; however, neural networks remain malleable throughout life. The present study hypothesizes that structural connectivity from key seed regions may induce effects on their connected targets, which are reflected in gene expression at those targeted regions. To test this hypothesis, analyses were performed on data from two brains from the Allen Human Brain Atlas, for which both gene expression and DW-MRI were available. Structural connectivity was estimated from the DW-MRI data and an approach motivated by network topology, that is, weighted gene coexpression network analysis (WGCNA), was used to cluster genes with similar patterns of expression across the brain. Group exponential lasso models were then used to predict gene cluster expression summaries as a function of seed region structural connectivity patterns. In several gene clusters, brain regions located in the brain stem, diencephalon, and hippocampal formation were identified that have significant predictive power for these expression summaries. These connectivity-associated clusters are enriched in genes associated with synaptic signaling and brain plasticity. Furthermore, using seed region based connectivity provides a novel perspective in understanding relationships between gene expression and connectivity. Hum Brain Mapp 38:3126-3140, 2017. © 2017 Wiley Periodicals, Inc.

show abstract

PRS-on-Spark (PRSoS): a novel, efficient and flexible approach for generating polygenic risk scores

et al. 2018

View full text Add to dashboard Cite

BackgroundPolygenic risk scores (PRS) describe the genomic contribution to complex phenotypes and consistently account for a larger proportion of variance in outcome than single nucleotide polymorphisms (SNPs) alone. However, there is little consensus on the optimal data input for generating PRS, and existing approaches largely preclude the use of imputed posterior probabilities and strand-ambiguous SNPs i.e., A/T or C/G polymorphisms. Our ability to predict complex traits that arise from the additive effects of a large number of SNPs would likely benefit from a more inclusive approach.ResultsWe developed PRS-on-Spark (PRSoS), a software implemented in Apache Spark and Python that accommodates different data inputs and strand-ambiguous SNPs to calculate PRS. We compared performance between PRSoS and an existing software (PRSice v1.25) for generating PRS for major depressive disorder using a community cohort (N = 264). We found PRSoS to perform faster than PRSice v1.25 when PRS were generated for a large number of SNPs (~ 17 million SNPs; t = 42.865, p = 5.43E-04). We also show that the use of imputed posterior probabilities and the inclusion of strand-ambiguous SNPs increase the proportion of variance explained by a PRS for major depressive disorder (from 4.3% to 4.8%).ConclusionsPRSoS provides the user with the ability to generate PRS using an inclusive and efficient approach that considers a larger number of SNPs than conventional approaches. We show that a PRS for major depressive disorder that includes strand-ambiguous SNPs, calculated using PRSoS, accounts for the largest proportion of variance in symptoms of depression in a community cohort, demonstrating the utility of this approach. The availability of this software will help users develop more informative PRS for a variety of complex phenotypes.Electronic supplementary materialThe online version of this article (10.1186/s12859-018-2289-9) contains supplementary material, which is available to authorized users.

show abstract

Agreement in DNA methylation levels from the Illumina 450K array across batches, tissues, and time

et al. 2018

View full text Add to dashboard Cite

Epigenome-wide association studies (EWAS) have focused primarily on DNA methylation as a chemically stable and functional epigenetic modification. However, the stability and accuracy of the measurement of methylation in different tissues and extraction types is still being actively studied, and the longitudinal stability of DNA methylation in commonly studied peripheral tissues is of great interest. Here, we used data from two studies, three tissue types, and multiple time points to assess the stability of DNA methylation measured with the Illumina Infinium HumanMethylation450 BeadChip array. Redundancy analysis enabled visual assessment of agreement of replicate samples overall and showed good agreement after removing effects of tissue type, age, and sex. At the probe level, analysis of variance contrasts separating technical and biological replicates clearly showed better agreement between technical replicates versus longitudinal samples, and suggested increased stability for buccal cells versus blood or blood spots. Intraclass correlations (ICCs) demonstrated that inter-individual variability is of similar magnitude to within-sample variability at many probes; however, as inter-individual variability increased, so did ICC. Furthermore, we were able to demonstrate decreasing agreement in methylation levels with time, despite a maximal sampling interval of only 576 days. Finally, at 6 popular candidate genes, there was a large range of stability across probes. Our findings highlight important sources of technical and biological variation in DNA methylation across different tissues over time. These data will help to inform longitudinal sampling strategies of future EWAS.

show abstract

Machine Learning to Predict Osteoporotic Fracture Risk from Genotypes

Forgetta

Keller-Baruch

Forest

et al. 2018

Preprint

View full text Add to dashboard Cite

Background: Genomics-based prediction could be useful since genome-wide genotyping costs less than many clinical tests. We tested whether machine learning methods could provide a clinically-relevant genomic prediction of quantitative ultrasound speed of sound (SOS)-a risk factor for osteoporotic fracture. Methods:We used 341,449 individuals from UK Biobank with SOS measures to develop genomically-predicted SOS (gSOS) using machine learning algorithms. We selected the optimal algorithm in 5,335 independent individuals and then validated it and its ability to predict incident fracture in an independent test dataset (N = 80,027). Finally, we explored whether genomic prescreening could complement a UK-based osteoporosis screening strategy, based on the validated tool FRAX.Results: gSOS explained 4.8-fold more variance in SOS than FRAX clinical risk factors (CRF) alone (r 2 = 23% vs. 4.8%). A standard deviation decrease in gSOS, adjusting for the CRF-FRAX score was associated with a higher increased odds of incident major osteoporotic fracture (1,491 cases / 78,536 controls, OR = 1.91 [1.70-2.14], P = 10 -28 ) than that for measured SOS (OR = 1.60 [1.50-1.69], P = 10 -52 ) and femoral neck bone mineral density (147 cases / 4,594 controls, OR = 1.53 [1.27-1.83], P = 10 -6 ). Individuals in the bottom decile of the gSOS distribution had a 3.25-fold increased risk of major osteoporotic fracture (P = 10 -18 ) compared to the top decile. A gSOS-based FRAX score, identified individuals at high risk for incident major osteoporotic fractures better than the CRF-FRAX score (P = 10 -14 ). Introducing a genomic prescreening step into osteoporosis screening in 4,741 individuals reduced the number of required clinical visits from 2,455 to 1,273 and the number of BMD tests from 1,013 to 473, while only reducing the sensitivity to identify individuals eligible for therapy from 99% to 95%.Interpretation: The use of genotypes in a machine learning algorithm resulted in a clinicallyrelevant prediction of SOS and fracture, with potential to impact healthcare resource utilization.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Marie Forest

A method for genome-wide genealogy estimation for thousands of samples

A method for genome-wide genealogy estimation for thousands of samples

Improved prediction of fracture risk leveraging a genome-wide polygenic risk score

Development of a polygenic risk score to improve screening for fracture risk: A genetic risk prediction study

Gene networks show associations with seed region connectivity

PRS-on-Spark (PRSoS): a novel, efficient and flexible approach for generating polygenic risk scores

Agreement in DNA methylation levels from the Illumina 450K array across batches, tissues, and time

Machine Learning to Predict Osteoporotic Fracture Risk from Genotypes

Contact Info

Product

Resources

About