Genome-wide prediction of pathogenic gain- and loss-of-function variants from ensemble learning of a diverse feature set

Stein, David; Bayrak, Çiğdem Sevim; Wu, Yiming; Stenson, Peter D.; Cooper, David Neil; Itan, Yuval; Schlessinger, Avner

doi:10.1101/2022.06.08.495288

Cited by 5 publications

(12 citation statements)

References 91 publications

(283 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Low-capacity models such as random forest and single layer perceptron fine-tuned ESM model is worse than pretrained PreMode except for SCN2A . We also compared it to LoGoFunc 42 , a method trained on G/LoF variants across genes. PreMode is slightly worse than LoGoFunc in SCN5A but better in all other genes ( Supplementary Figure 8a ).…”

Section: Resultsmentioning

confidence: 99%

PreMode predicts mode-of-action of missense variants by deep graph representation learning of protein sequence and structural context

Zhong,

Zhao,

Zhuang

et al. 2024

Preprint

View full text Add to dashboard Cite

Accurate prediction of the functional impact of missense variants is fundamentally important for disease gene discovery, clinical genetic diagnostics, therapeutic strategies, and protein engineering. Previous efforts have focused on predicting a binary pathogenicity classification, but the functional impact of missense variants is multi-dimensional. Pathogenic missense variants in the same gene may act through different modes of action (i.e., gain/loss-of-function) by affecting multiple protein biochemical properties. They may result in distinct clinical conditions that require different treatments. We developed a new method, PreMode, to perform gene-specific mode-of-action predictions. PreMode models effects of coding sequence variants using SE(3)-equivariant graph neural networks on protein sequences and structures. Using the largest-to-date set of mode-of-action-labeled missense variants, we show that PreMode reaches state-of-the-art performance in multiple types of mode-of-action predictions by efficient transfer-learning. Additionally, PreMode prediction of G/LoF variants in a kinase is consistent with inactive-active conformation transition. Finally, we show that PreMode enables improved mutagenesis analysis, clinical diagnosis and more broadly, artificial GoF engineering of proteins.

show abstract

Section: Resultsmentioning

confidence: 99%

PreMode predicts mode-of-action of missense variants by deep graph representation learning of protein sequence and structural context

Zhong,

Zhao,

Zhuang

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…Future studies with even larger sample sizes may reclassify some portions of the genes tested here, identifying some regions we erroneously excluded or identifying new regions to include. Other improvements will involve focusing on certain classes of variants in the gene, such as those computationally predicted to be gain or loss of function, those with functional data from screenings, and those in transcripts expressed in tissues of interest 19,26,27 .…”

Section: Discussionmentioning

confidence: 99%

A power-based sliding window approach to evaluate the clinical impact of rare genetic variants

Cirulli

Barrett

Bolze

et al. 2022

Preprint

View full text Add to dashboard Cite

Systematic determination of rare and novel variant pathogenicity remains a major challenge, even when there is an established association between a gene and phenotype. Here we present Power Window (PW), a novel sliding window technique that identifies the clinically impactful regions of a gene using population-scale clinico-genomic datasets. By sizing windows based on the number of variant carriers, rather than the number of variants or nucleotides, statistical power is held constant during analysis, enabling the localization of clinical impact as well as the removal of unassociated gene regions. This method can be used to focus on: specific variant types such as loss of function (LoF) or other coding; parts of a gene, such as those expressed in different tissues; or isolating gene regions with opposite directions of effect. Using a training set of 300K exomes from the UKBiobank (UKB), we developed PW-based LoF and coding models for well-established gene-disease associations and tested their accuracy in two additional cohorts (128k exomes from the UKB and 30k exomes from the Healthy Nevada Project (HNP)). The significant PW models retained a mean of 64% of the rare variant carriers in each gene (range 16-98%), with quantitative traits showing a mean effect size improvement of 48% compared to aggregating rare variants across the entire gene, and the odds ratios for binary traits improving by a mean of 2.4-fold. PW showcases that EHR-based statistical analyses can accurately distinguish between novel coding variants that will have high phenotypic penetrance in a population and those that will not, unlocking new potential for population genetic screening.

show abstract

“…Coding variants were defined as those impacting protein coding transcript annotated as missense variant or predicted to have "high" impact. We also retrieved predicted gain or loss of function (GoLoF) variants from LoGoFunc [47], and linked non-coding variants to genes using activity-by-contact (ABC) maps [41]. ABC scores represent the contribution of an enhancer to the regulation of gene, measured by multiplying the estimates of enhancer activity and threedimensional contact frequencies between enhancers and promoters.…”

Section: Variant Annotationmentioning

confidence: 99%

“…We prioritized genes as putatively causal using a combination of evidence including MR, colocalization H4 posterior probabilities (PP) with molQTL, presence of an associated GoLoF variant [47] or other coding variants, distance to lead variant, and enhancer-promoter ABC scores [41]. Specifically, we ranked genes as follow: For a given locus, we then prioritized the best gene(s) as the one with the highest rank.…”

Section: Causal Gene Prioritizationmentioning

confidence: 99%

See 1 more Smart Citation

Leveraging large-scale multi-omics to identify therapeutic targets from genome-wide association studies

Lessard,

Chao,

Reis

et al. 2023

Preprint

View full text Add to dashboard Cite

BACKGROUND: Therapeutic targets supported by genetic evidence from genome-wide association studies (GWAS) show higher probability of success in clinical trials. GWAS is a powerful approach to identify links between genetic variants and phenotypic variation; however, identifying the genes driving associations identified in GWAS remains challenging. Integration of molecular quantitative trait loci (molQTL) such as expression QTL (eQTL) using mendelian randomization (MR) and colocalization analyses can help with the identification of causal genes. Careful interpretation remains warranted because eQTL can affect the expression of multiple genes within the same locus.METHODS: We used a combination of genomic features that include variant annotation, activity-by-contact maps, MR, and colocalization with molQTL to prioritize causal genes across 4,611 disease GWAS and meta-analyses from biobank studies, namely FinnGen, Estonian Biobank and UK Biobank.RESULTS: Genes identified using this approach are enriched for gold standard causal genes and capture known biological links between disease genetics and biology. In addition, we find that eQTLs colocalizing with GWAS are statistically enriched for corresponding disease-relevant tissues. We show that predicted directionality from MR is generally consistent with matched drug mechanism of actions (>78% for approved drugs). Compared to the nearest gene mapping method our approach also shows a higher enrichment in approved therapeutic targets (risk ratio 1.38 vs 2.06). Finally, using this approach, we detected a novel association between the IL6 receptor signal transduction gene IL6ST and polymyalgia rheumatica, an indication for which sarilumab, a monoclonal antibody against IL-6, has been recently approved.CONCLUSIONS: Combining variant annotation and activity-by-contact maps to molQTL increases performance to identify causal genes, while informing on directionality which can be translated to successful target identification and drug development.

show abstract

Genome-wide prediction of pathogenic gain- and loss-of-function variants from ensemble learning of a diverse feature set

Cited by 5 publications

References 91 publications

PreMode predicts mode-of-action of missense variants by deep graph representation learning of protein sequence and structural context

PreMode predicts mode-of-action of missense variants by deep graph representation learning of protein sequence and structural context

A power-based sliding window approach to evaluate the clinical impact of rare genetic variants

Leveraging large-scale multi-omics to identify therapeutic targets from genome-wide association studies

Contact Info

Product

Resources

About