Kathryn E. Kemper scite author profile

The capacity to accurately predict an individual's phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. Recently, Bayesian methods for generating polygenic predictors have been successfully applied in human genomics but require the individual level data, which are often limited in their access due to privacy or logistical concerns, and are computationally very intensive. This has motivated methodological frameworks that utilise publicly available genome-wide association studies (GWAS) summary data, which now for some traits include results from greater than a million individuals. In this study, we extend the established summary statistics methodological framework to include a class of point-normal mixture prior Bayesian regression models, which have been shown to generate optimal genetic predictions and can perform heritability estimation, variant mapping and estimate the distribution of the genetic effects. In a wide range of simulations and cross-validation using 10 real quantitative traits and 1.1 million variants on 350,000 individuals from the UK Biobank (UKB), we establish that our summary based method, SBayesR, performs similarly to methods that use the individual level data and outperforms other state-of-the-art summary statistics methods in terms of prediction accuracy and heritability estimation at a fraction of the computational resources. We generate polygenic predictors for body mass index and height in two independent data sets and show that by exploiting summary statistics on 1.1 million variants from the largest GWAS meta-analysis (n ≈ 700, 000) that the SBayesR prediction R 2 improved on average across traits by 6.8% relative to that estimated from an individual-level data BayesR analysis of data from the UKB (n ≈ 450, 000). Compared with commonly used state-of-the-art summarybased methods, SBayesR improved the prediction R 2 by 4.1% relative to LDpred and by 28.7% relative to clumping and p-value thresholding. SBayesR gave comparable prediction accuracy to the recent RSS method, which has a similar model, but at a computational time that is two orders of magnitude smaller. The methodology is implemented in a very efficient and user-friendly software tool titled GCTB. Introduction 1The capacity to accurately predict an individual's phenotype from their DNA sequence 2 is one of the great promises of genomics and precision medicine 1-5 , recognising that the 3 accuracy of a genetic risk predictor is dependent on the genetic contribution to variation 4 in the trait. It is anticipated that genetic risk prediction will be useful for informing early 5 disease intervention and aiding diagnosis by identifying individuals with an increased 6 genetic risk of disease 5-7 . Accurate genetic predictors for complex traits and disorders are 7 currently limited, due mainly to an incomplete understanding of complex genetic varia-8 tion, small training sample sizes and suboptimal modelling 4,8,9 . Through large consortia 9 and biobank initiatives, sample sizes for gen...

show abstract

Meta-analysis of genome-wide association studies for height and body mass index in ∼700,000 individuals of European ancestry

Yengo

Sidorenko

Kemper

et al. 2018

Preprint

129

162

View full text Add to dashboard Cite

show abstract

Identifying gene targets for brain-related traits using transcriptomic and methylomic data from blood

Zeng

Zhang

et al. 2018

Preprint

View full text Add to dashboard Cite

Understanding the difference in genetic regulation of gene expression between brain and blood is important for discovering genes associated with brain-related traits and disorders. Here, we estimate the correlation of genetic effects at the top associated cis-expression (cis-eQTLs or cis-mQTLs) between brain and blood for genes expressed (or CpG sites methylated) in both tissues, while accounting for errors in their estimated effects (rb). Using publicly available data (n = 72 to 1,366), we find that the genetic effects of cis-eQTLs (PeQTL < 5´10 -8 ) or mQTLs (PmQTL < 1´10 -10 ) are highly correlated between independent brain and blood samples ( " = 0.70 with SE = 0.015 for cis-eQTL and " = 0.78 with SE = 0.006 for cis-mQTLs). Using meta-analyzed brain eQTL/mQTL data (n = 526 to 1,194), we identify 61 genes and 167 DNA methylation (DNAm) sites associated with 4 brain-related traits and disorders. Most of these associations are a subset of the discoveries (97 genes and 295 DNAm sites) using data from blood with larger sample sizes (n = 1,980 to 14,115). We further find that cis-eQTLs with tissue-specific effects are approximately uniformly distributed across all the functional annotation categories, and that mean difference in gene expression level between brain and blood is almost independent of the difference in the corresponding cis-eQTL effect. Our results demonstrate the gain of power in gene discovery for brain-related phenotypes using blood cis-eQTL or cis-mQTL data with large sample sizes.Charitable Foundation. This study makes use of data from dbGaP (accessions: phs000428.v1.p1 and phs000424.v6.p1), UK Biobank Resource (application number: 12514), UK10K project and CommonMind Consortium. A full list of acknowledgements to these data sets can be found in Supplementary Note. The members of the eQTLGen Consortium are (in alphabetical order):

show abstract

A resource-efficient tool for mixed model association analysis of large-scale data

Jiang

Zheng

et al. 2019

Preprint

View full text Add to dashboard Cite

The genome-wide association study (GWAS) has been widely used as an experimental design to detect associations between genetic variants and a phenotype. Two major confounding factors, population stratification and relatedness, could potentially lead to inflated GWAS test-statistics and thereby spurious associations. Mixed linear model (MLM)-based approaches can be used to account for sample structure. However, genome-wide association (GWA) analyses in biobank samples such as the UK Biobank (UKB) often exceed the capability of most existing MLM-based tools especially if the number of traits is large. Here, we developed an MLM-based tool (called fastGWA) that controls for population stratification by principal components and relatedness by a sparse genetic relationship matrix for GWA analyses of biobank-scale data. We demonstrated by extensive simulations that fastGWA is reliable, robust and highly resource-efficient. We then applied fastGWA to 2,173 traits on 456,422 array-genotyped and imputed individuals and 2,048 traits on 46,191 whole-exome-sequenced individuals in the UKB.

show abstract

Widespread signatures of natural selection across human complex traits and functional genomic categories

et al. 2021

View full text Add to dashboard Cite

Understanding how natural selection has shaped genetic architecture of complex traits is of importance in medical and evolutionary genetics. Bayesian methods have been developed using individual-level GWAS data to estimate multiple genetic architecture parameters including selection signature. Here, we present a method (SBayesS) that only requires GWAS summary statistics. We analyse data for 155 complex traits (n = 27k–547k) and project the estimates onto those obtained from evolutionary simulations. We estimate that, on average across traits, about 1% of human genome sequence are mutational targets with a mean selection coefficient of ~0.001. Common diseases, on average, show a smaller number of mutational targets and have been under stronger selection, compared to other traits. SBayesS analyses incorporating functional annotations reveal that selection signatures vary across genomic regions, among which coding regions have the strongest selection signature and are enriched for both the number of associated variants and the magnitude of effect sizes.

show abstract

Imprint of Assortative Mating on the Human Genome

Yengo

Robinson

Kemper

et al. 2018

Preprint

View full text Add to dashboard Cite

Non-random mate-choice with respect to complex traits is widely observed in humans, but whether this reflects true phenotypic assortment, environment (social homogamy) or convergence after choosing a partner is not known. Understanding the causes of mate choice is important, because assortative mating (AM) if based upon heritable traits, has genetic and evolutionary consequences. AM is predicted under Fisher's classical theory1 to induce a signature in the genome at trait-associated loci that can be detected and quantified. Here, we develop and apply a method to quantify AM on a specific trait by estimating the correlation (θ) between genetic predictors of the trait from SNPs on odd versus even chromosomes. We show by theory and simulation that the effect of AM can be distinguished from population stratification. We applied this approach to 32 complex traits and diseases using SNP data from ~400,000 unrelated individuals of European ancestry. We found significant evidence of AM for height (θ=3.2%) and educational attainment (θ=2.7%), both consistent with theoretical predictions. Overall, our results imply that AM involves multiple traits, affects the genomic architecture of loci that are associated with these traits and that the consequence of mate choice can be detected from a random sample of genomes.. CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint . http://dx.doi.org/10.1101/300020 doi: bioRxiv preprint first posted online Apr. 13, 2018; 3 Non-random mating in natural populations has short and long-term evolutionary consequences. In many species, including humans, mate choice is often associated with phenotypic similarities between mates 2,3 . Such phenotypic similarities have multiple sources, for example social homogamy, the preference for a mate from the same environment, or because of primary assortment on certain traits observable at the time of mate choice. Contrary to social homogamy, primary phenotypic assortment, here referred to as assortative mating (AM), has genetic and evolutionary consequences and therefore is the focus of our study. In humans, AM involves multiple complex traits 4-8 and can sometimes lead to similar susceptibility to diseases [9][10][11][12] . The genetic effects of AM were first studied in the seminal articles of Fisher (1918) 1 and Wright (1921) 13 . Those two founding contributions, further complemented by Crow & Kimura (1970) 14 and others [15][16][17] have set the basis of the theory of AM on complex traits. AM theory predicts three main genetic consequences of a positive correlation between the phenotypes of mates in a population: (i) an increase of the genetic variance in the population, (ii) an increase in the correlation between relatives and (iii) an increase of homozygosity (deviation from HardyWeinberg Equilibrium; HWE), in particular at causal loci. These seemingly distinct c...

show abstract

Genetic architecture of body size in mammals

Kemper¹,

Visscher²,

Goddard³

2012

Genome Biol

View full text Add to dashboard Cite

Body size, as measured by height in humans or weight in domestic species, is an archetypical quantitative or com plex trait that shows continuous variation. It has been extensively recorded and studied for over a century because of its importance to ecology, its relevance in farming, and because it is an important indicator of human growth and health [1]. The genetic architecture underlying body size was initially uncertain and Fisher proposed an infinitesimal model that was successfully applied for many years [2]. This model, with an infinite number of loci, each with infinitesimal effect, is not literally true but it does provide a good fit to the data. In more recent times the infinitesimal model has gradually been replaced by a finite number of loci, each with discrete mutations. However, observations now form almost two disjointed sets: one set in which individual mutations have large effects (that is, socalled Mendelian traits) and another set where variants have small effects. This review attempts to bridge the gap between these two sets of observations using body size as an example of an extensively studied complex trait in mammalian species.The genetic architecture underlying variation in complex traits is currently a topic of extensive debate. This is particularly true for human complex diseases but also for agriculture because of its impact in predicting future phenotypes (for example, [36]). Primarily it is the number, size and frequency of mutations that are under the most scrutiny. Taking the human disease example, some argue for a common disease common variant hypothesis where genetic susceptibility to disease is the result of many relatively highfrequency mutations each with small effect on disease susceptibility. However, others argue for a rare variant common disease hypo thesis where many lowfrequency mutations have large effects. As we shall see, observations on the genetic architecture underlying body size for humans and other mammals provide evidence for both hypotheses. Our discussion begins by describing the number, frequency and size of mutations with large effects for humans, mice and domesticated species. We then move onto genome wide association studies (GWASs) that have investigated segregating variation in these species. We find evidence for moderatetolarge effect mutations in domestic species but highlight that this category of mutations goes undetected in human studies. Finally, we apply simple evolutionary theory to explain the observed distribution of mutation effects for human stature. Our model implies that most of the segregating variation in human height is caused by mutations with smalltomoderate effects. Variants of large effect Family studies in humansIdentification of causative mutations for socalled Men delian traits has been possible by studying the segregation within families of mutations and phenotypes. Such mutations must have large effects so that individuals can be classified into genotype classes using their phenotype despite the background variation caused by ot...

show abstract

Effectiveness and cost of multilayered colorectal cancer screening promotion interventions at federally qualified health centers in Washington State

Kemper¹,

Glaze²,

Eastman

et al. 2018

Cancer

View full text Add to dashboard Cite

BACKGROUND: It has been demonstrated that fecal immunochemical test (FIT) mailing programs are effective for increasing colorectal cancer (CRC) screening. The objectives of the current study were to assess the magnitude of uptake that could be achieved with a mailed FIT program in a federally qualified health center and whether such a program can be implemented at a reasonable cost to support sustainability. METHODS: The Washington State Department of Health’s partner HealthPoint implemented a direct-mail FIT program at 9 medical clinics, along with a follow-up reminder letter and automated telephone calls to those not up-to-date with recommended screening. Supplemental outreach events at selected medical clinics and a 50th birthday card screening reminder program also were implemented. The authors collected and analyzed process, effectiveness, and cost measures and conducted a systematic assessment of the short-term cost effectiveness of the interventions. RESULTS: Overall, 5178 FIT kits were mailed with 4009 follow-up reminder letters, and 8454 automated reminder telephone calls were made over 12 months. In total, 1607 FIT kits were returned within 3 months of the end of the implementation period: an overall return rate of 31% for the mailed FIT program. The average total intervention cost per FIT kit returned was $39.81, and the intervention implementation cost per kit returned was $18.76. CONCLUSIONS: The mailed FIT intervention improved CRC screening uptake among HealthPoint’s patient population. This intervention was implemented for less than $40 per individual successfully screened. The findings and lessons learned can assist other clinics that serve disadvantaged populations to increase their CRC screening adherence.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kathryn E. Kemper

Improved polygenic prediction by Bayesian multiple regression on summary statistics

Meta-analysis of genome-wide association studies for height and body mass index in ∼700,000 individuals of European ancestry

Identifying gene targets for brain-related traits using transcriptomic and methylomic data from blood

A resource-efficient tool for mixed model association analysis of large-scale data

Widespread signatures of natural selection across human complex traits and functional genomic categories

Imprint of Assortative Mating on the Human Genome

Genetic architecture of body size in mammals

Effectiveness and cost of multilayered colorectal cancer screening promotion interventions at federally qualified health centers in Washington State

Contact Info

Product

Resources

About