Enrichment analyses identify shared associations for 25 quantitative traits in over 600,000 individuals from seven diverse ancestries

Smith, Stephen; Shahamatdar, Sahar; Cheng, Wei; Zhang, S; Paik, Joseph; Graff, Mariaelisa; Haiman, Christopher A.; Matise, Tara C.; North, K. E.; Peters, Ulrike; Gignoux, Chris; Wojcik, Genevieve L.; Crawford, Lorin; Ramachandran, Sohini

doi:10.1101/2021.04.20.440612

Cited by 4 publications

(5 citation statements)

References 191 publications

(415 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As discussed in ref. 43 , it is more ideal to consider the ancestrytrait-specific Bonferroni-corrected significance threshold. In our study, we only consider the Taiwanese population, and the maximum number of tested SNPs is 5,981,581 for all traits.…”

Section: Discussionmentioning

confidence: 99%

Phenome-wide analysis of Taiwan Biobank reveals novel glycemia-related loci and genetic risks for diabetes

et al. 2022

View full text Add to dashboard Cite

To explore the complex genetic architecture of common diseases and traits, we conducted comprehensive PheWAS of ten diseases and 34 quantitative traits in the community-based Taiwan Biobank (TWB). We identified 995 significantly associated loci with 135 novel loci specific to Taiwanese population. Further analyses highlighted the genetic pleiotropy of loci related to complex disease and associated quantitative traits. Extensive analysis on glycaemic phenotypes (T2D, fasting glucose and HbA1c) was performed and identified 115 significant loci with four novel genetic variants (HACL1, RAD21, ASH1L and GAK). Transcriptomics data also strengthen the relevancy of the findings to metabolic disorders, thus contributing to better understanding of pathogenesis. In addition, genetic risk scores are constructed and validated for absolute risks prediction of T2D in Taiwanese population. In conclusion, our data-driven approach without a priori hypothesis is useful for novel gene discovery and validation on top of disease risk prediction for unique non-European population.

show abstract

Section: Discussionmentioning

confidence: 99%

Phenome-wide analysis of Taiwan Biobank reveals novel glycemia-related loci and genetic risks for diabetes

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Importantly, our method assumes only that causal genes for complex traits are shared across ancestries while making no assumptions on underlying eQTL architectures across ancestries. This is an important feature of our method considering recent findings that SNP-level replication across genetic ancestries is weaker than gene-level replication 36 , and that only ∼30% of SNP-gene expression associations are shared between European- and African-American ancestry 39 . Through extensive simulations, we demonstrate that MA-FOCUS’ ability to identify causal genes is superior to baseline approaches and is robust to data-dependent limitations (see Methods ).…”

Section: Discussionmentioning

confidence: 99%

“…Instead, MA-FOCUS assumes only that the causal genes for a focal trait or disease are shared across ancestries. It is expected that gene-level effects are likely more transferable across ancestry groups than SNP-level effects as genes are inherently a more meaningful biological unit 36 . As a result, MA-FOCUS leverages cross-ancestry heterogeneity in LD patterns and eQTL associations to identify causal genes with improved precision and accuracy when compared with alternative approaches.…”

Section: Introductionmentioning

confidence: 99%

Multi-ancestry fine-mapping improves precision to identify causal genes in transcriptome-wide association studies

Yuan

Conti

et al. 2022

Preprint

View full text Add to dashboard Cite

Transcriptome-wide association studies (TWAS) are a powerful approach to identify genes whose expression associates with complex disease risk. However, non-causal genes can exhibit association signals due to confounding by linkage disequilibrium patterns (LD) and eQTL pleiotropy at genomic risk regions which necessitates fine-mapping of TWAS signals. Here, we present MA-FOCUS, a multi-ancestry framework for the improved identification of genes underlying traits of interest. We demonstrate that by leveraging differences in ancestry-specific patterns of LD and eQTL signals, MA-FOCUS consistently outperforms single-ancestry fine-mapping approaches with equivalent total sample size across multiple metrics. We perform 15 blood trait TWAS using genome-wide summary statistics (average NEA=511k, NAA=13k) and lymphoblastoid cell line eQTL data from cohorts of primarily European and African continental ancestries. We recapitulate evidence demonstrating shared genetic architectures for eQTL and blood traits between the two ancestry groups and observe that gene-level effects correlate 20% more strongly across ancestries compared with SNP-level effects. We perform fine-mapping using MA-FOCUS and find evidence that genes at TWAS risk regions are more likely to be shared across ancestries rather than ancestry-specific. Using multiple lines of evidence to validate our findings, we find gene sets produced by MA-FOCUS are more enriched in hematopoietic categories compared to alternative approaches (P=1.73e-16). Our work demonstrates that including, and appropriately accounting for, genetic diversity can drive deeper insights into the genetic architecture of complex traits.

show abstract

“…Individuals with HBA1C readings of 42-48 mmol/mol, a range associated with prediabetes, were not included in the analysis. Ancestry Mismatch Experiment: Individuals were first divided on the basis of ancestry, as in Smith et al 2021, identifying 349,411 individuals of self-identified European descent, and 4,967 individuals of African descent. The latter of which were identified both by self-identification and by an ADMIXTURE analysis as described in (Smith et al 2021).…”

Section: Unknown Class Example: Wheat Seeds Datasetmentioning

confidence: 99%

“…Ancestry Mismatch Experiment: Individuals were first divided on the basis of ancestry, as in Smith et al 2021, identifying 349,411 individuals of self-identified European descent, and 4,967 individuals of African descent. The latter of which were identified both by self-identification and by an ADMIXTURE analysis as described in (Smith et al 2021). Applying the HBA1C filter described above resulted in 8,631 individuals in the European/elevated cohort, 268 individuals in the African/elevated cohort, 243,283 individuals in the European/normal cohort and 2,532 individuals in the African/normal cohort.…”

Section: Unknown Class Example: Wheat Seeds Datasetmentioning

confidence: 99%

Enabling interpretable machine learning for biological data with reliability scores

Ahlquist

Sudgen

Ramachandran

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Machine learning has become an important tool across biological disciplines, allowing researchers to draw conclusions from large datasets, and opening up new opportunities for interpreting complex and heterogeneous biological data. Alongside the rapid growth of machine learning, there have also been growing pains: some models that appear to perform well have later been revealed to rely on features of the data that are artifactual or biased; this feeds into the general criticism that machine learning models are designed to optimize model performance over the creation of new biological insights. A natural question thus arises: how do we develop machine learning models that are inherently interpretable or explainable? In this manuscript, we describe reliability scores, a new concept for scientific machine learning studies that assesses the ability of a classifier to produce a reliable classification for a given instance. We develop a specific implementation of a reliability score, based on our work in Sugden et al. 2018 in which we introduced SWIF(r), a generative classifier for detecting selection in genomic data. We call our implementation the SWIF(r) Reliability Score (SRS), and demonstrate the utility of the SRS when faced with common challenges in machine learning including: 1) an unknown class present in testing data that was not present in training data, 2) systemic mismatch between training and testing data, and 3) instances of testing data that are missing values for some attributes. We explore these applications of the SRS using a range of biological datasets, from agricultural data on seed morphology, to 22 quantitative traits in the UK Biobank, and population genetic simulations and 1000 Genomes Project data. With each of these examples, we demonstrate how interpretability tools for machine learning like the SRS can allow researchers to interrogate their data thoroughly, and to pair their domain-specific knowledge with powerful machine-learning frameworks. We hope that this tool, and the surrounding discussion, will aid researchers in the biological machine learning space as they seek to harness the power of machine learning without sacrificing rigor and biological understanding.

show abstract

Enrichment analyses identify shared associations for 25 quantitative traits in over 600,000 individuals from seven diverse ancestries

Cited by 4 publications

References 191 publications

Phenome-wide analysis of Taiwan Biobank reveals novel glycemia-related loci and genetic risks for diabetes

Phenome-wide analysis of Taiwan Biobank reveals novel glycemia-related loci and genetic risks for diabetes

Multi-ancestry fine-mapping improves precision to identify causal genes in transcriptome-wide association studies

Enabling interpretable machine learning for biological data with reliability scores

Contact Info

Product

Resources

About