Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies

Zhou, Wei; Jb, Nielsen; Fritsche, Lars G.; Dey, Rounak; Gabrielsen, Maiken Elvestad; Wolford, Brooke N.; LeFaive, Jonathon; VandeHaar, Peter; Gagliano, Sarah A.; Gifford, Aliya; Bastarache, Lisa; Wei, Wei‐Qi; Denny, Joshua C.; Lin, Maoxuan; Hveem, Kristian; Kang, Hyun Min; Abecasis, Gonçalo R.; Willer, Cristen J.; Lee, Seunggeun

doi:10.1038/s41588-018-0184-y

Cited by 1,008 publications

(1,000 citation statements)

References 29 publications

Supporting

Mentioning

907

Contrasting

Unclassified

Order By: Relevance

“…Most of the methods are based on the liability threshold model (Blangero et al, 2001;Golan et al, 2014;Loh et al, 2015;Weissbrod et al, 2018;Yang et al, 2011) and GLMM based on the logit link is a possible alternative of a disease model (Wang et al, 2015). However, the relative proportion of variances attributable to the polygenic effects cannot be defined for GLMM using the logit link (Chen et al, 2016;Papachristou et al, 2011;Zhou et al, 2018). Heritability shows an important utility for genetic epidemiology; however, heritability estimation of dichotomous phenotypes can be extremely complicated due to ascertainment bias.…”

Section: Discussionmentioning

confidence: 99%

“…Haseman-Elston regression is a well-known method for estimating variance components, and by restricting the phenotypic variance to 1, the heritability can be estimated as the sum of the coefficient estimates of the kinship matrix (Golan, Lander, & Rosset, 2014;Haseman & Elston, 1972). For dichotomous traits, generalized linear mixed models (GLMM) or Liability Threshold Models have often been utilized (Burton et al, 1999;Chen et al, 2016;Papachristou, Ober, & Abney, 2011;Zhou et al, 2018). However, the variance estimates from GLMM are biased for ascertained samples, and it is not easy to define the proportion of variances attributable to the polygenic effect.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Heritability estimation of dichotomous phenotypes using a liability threshold model on ascertained family‐based samples

Kim

Kwak

Won

2019

Genetic Epidemiology

View full text Add to dashboard Cite

Numerous methods for estimating heritability have been proposed; however, unlike quantitative phenotypes, heritability estimation for dichotomous phenotypes is computationally and statistically complex, and the use of heritability is infrequent. In this study, we developed a statistical method to estimate heritability of dichotomous phenotypes using a liability threshold model in the context of ascertained family‐based samples. This model assumes that dichotomous phenotypes are determined by unobserved latent variables that are normally distributed and can be applied to general pedigree data. The proposed methods were applied to simulated data and Korean type‐2 diabetes family‐based samples, and the accuracy of the estimates provided by the experimental methods was compared with that of the established methods.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Heritability estimation of dichotomous phenotypes using a liability threshold model on ascertained family‐based samples

Kim

Kwak

Won

2019

Genetic Epidemiology

View full text Add to dashboard Cite

show abstract

“…In general, mixed model approaches treat the phenotype as random, whereas the TDT and the TDT generalizations consider the phenotypes as fixed. SAIGE utilizes a saddle point approximation of the score statistic to deal with data sets with the imbalanced case-control ratio (W. Zhou et al, 2018). Therefore, the standard scenario assumes the random sampling of a quantitative trait from the population.…”

Section: Mixed Model Approachesmentioning

confidence: 99%

“…Very recently, a similar approach to GMMAT was proposed. SAIGE utilizes a saddle point approximation of the score statistic to deal with data sets with the imbalanced case-control ratio (W. Zhou et al, 2018). SAIGE is very flexible as it can incorporate unrelated and related samples and only requires the GRM information to correct for the corresponding structure.…”

Section: Mixed Model Approachesmentioning

confidence: 99%

“…The first two principal components for both scenarios, U and ADM, are plotted against each other in Supporting Information Figures 1 and 2 (Section C). Then, we tested the remaining 95,000 variants for each MAF using the null model and the score statistic-based SAIGE approach, implemented in the SAIGE R package (W. Zhou et al, 2018). First, we chose a phenotype structure where we randomly assigned 2,000 samples as affected and 1,000 as unaffected.…”

Section: Appendix Bmentioning

confidence: 99%

See 1 more Smart Citation

A comparison of popular TDT‐generalizations for family‐based association analysis

Hecker

Laird

Lange

2019

Genetic Epidemiology

View full text Add to dashboard Cite

The transmission disequilibrium test (TDT) is the gold standard for testing the association between a genetic variant and disease in samples consisting of affected individuals and their parents. In practice, more complex pedigree structures, that is siblings with no parents, or three‐generational pedigrees with possibly missing genotypes, are common. There are several generalizations of the TDT that are suitable for use with arbitrary pedigree structures. We consider three such frequently used generalizations, family‐based association test, pedigree disequilibrium test, and generalized disequilibrium test, that have accompanying software and compare them regarding validity and power in the single variant setting. We use simulations to study the effects of population admixture, populations whose genotypes are not in Hardy–Weinberg equilibrium (HWE), different pedigree structures, and the presence of linkage. Whereas our results show that some TDT generalizations can have a substantially increased Type 1 error, these tests are often used in substantive research without caveats about the validity of their Type 1 error. For the association analysis of rare variants in sequencing studies, region‐based extensions of the TDT generalizations, that rely on the postulated robustness of the single variant tests, have been proposed. We discuss the implications of our results for these region‐based extensions.

show abstract

Genetic Risk Factors Associated With Preeclampsia and Hypertensive Disorders of Pregnancy

et al. 2023

Self Cite

View full text Add to dashboard Cite

ImportanceA genetic contribution to preeclampsia susceptibility has been established but is still incompletely understood.ObjectiveTo disentangle the underlying genetic architecture of preeclampsia and preeclampsia or other maternal hypertension during pregnancy with a genome-wide association study (GWAS) of hypertensive disorders of pregnancy.Design, Setting, and ParticipantsThis GWAS included meta-analyses in maternal preeclampsia and a combination phenotype encompassing maternal preeclampsia and preeclampsia or other maternal hypertensive disorders. Two overlapping phenotype groups were selected for examination, namely, preeclampsia and preeclampsia or other maternal hypertension during pregnancy. Data from the Finnish Genetics of Pre-eclampsia Consortium (FINNPEC, 1990-2011), Finnish FinnGen project (1964-2019), Estonian Biobank (1997-2019), and the previously published InterPregGen consortium GWAS were combined. Individuals with preeclampsia or other maternal hypertension during pregnancy and control individuals were selected from the cohorts based on relevant International Classification of Diseases codes. Data were analyzed from July 2020 to February 2023.ExposuresThe association of a genome-wide set of genetic variants and clinical risk factors was analyzed for the 2 phenotypes.ResultsA total of 16 743 women with prior preeclampsia and 15 200 with preeclampsia or other maternal hypertension during pregnancy were obtained from FINNPEC, FinnGen, Estonian Biobank, and the InterPregGen consortium study (respective mean [SD] ages at diagnosis: 30.3 [5.5], 28.7 [5.6], 29.7 [7.0], and 28 [not available] years). The analysis found 19 genome-wide significant associations, 13 of which were novel. Seven of the novel loci harbor genes previously associated with blood pressure traits (NPPA, NPR3, PLCE1, TNS2, FURIN, RGL3, and PREX1). In line with this, the 2 study phenotypes showed genetic correlation with blood pressure traits. In addition, novel risk loci were identified in the proximity of genes involved in the development of placenta (PGR, TRPC6, ACTN4, and PZP), remodeling of uterine spiral arteries (NPPA, NPPB, NPR3, and ACTN4), kidney function (PLCE1, TNS2, ACTN4, and TRPC6), and maintenance of proteostasis in pregnancy serum (PZP).Conclusions and RelevanceThe findings indicate that genes related to blood pressure traits are associated with preeclampsia, but many of these genes have additional pleiotropic effects on cardiometabolic, endothelial, and placental function. Furthermore, several of the associated loci have no known connection with cardiovascular disease but instead harbor genes contributing to maintenance of successful pregnancy, with dysfunctions leading to preeclampsialike symptoms.

show abstract

Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies

Cited by 1,008 publications

References 29 publications

Heritability estimation of dichotomous phenotypes using a liability threshold model on ascertained family‐based samples

Heritability estimation of dichotomous phenotypes using a liability threshold model on ascertained family‐based samples

A comparison of popular TDT‐generalizations for family‐based association analysis

Genetic Risk Factors Associated With Preeclampsia and Hypertensive Disorders of Pregnancy

Contact Info

Product

Resources

About