2022
DOI: 10.1101/2022.06.08.495288
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Genome-wide prediction of pathogenic gain- and loss-of-function variants from ensemble learning of a diverse feature set

Abstract: Gain-of-function (GOF) variants yield increased or novel protein function while loss-of-function (LOF) variants yield diminished protein function. GOF and LOF variants can result in markedly varying phenotypes even when occurring in the same gene. Experimental approaches for identifying GOF and LOF are slow and costly, and computational tools cannot accurately discriminate between GOF and LOF variants. We developed LoGoFunc, an ensemble machine learning method for predicting pathogenic GOF, pathogenic LOF and … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(12 citation statements)
references
References 91 publications
(283 reference statements)
0
12
0
Order By: Relevance
“…Low-capacity models such as random forest and single layer perceptron fine-tuned ESM model is worse than pretrained PreMode except for SCN2A . We also compared it to LoGoFunc 42 , a method trained on G/LoF variants across genes. PreMode is slightly worse than LoGoFunc in SCN5A but better in all other genes ( Supplementary Figure 8a ).…”
Section: Resultsmentioning
confidence: 99%
“…Low-capacity models such as random forest and single layer perceptron fine-tuned ESM model is worse than pretrained PreMode except for SCN2A . We also compared it to LoGoFunc 42 , a method trained on G/LoF variants across genes. PreMode is slightly worse than LoGoFunc in SCN5A but better in all other genes ( Supplementary Figure 8a ).…”
Section: Resultsmentioning
confidence: 99%
“…Future studies with even larger sample sizes may reclassify some portions of the genes tested here, identifying some regions we erroneously excluded or identifying new regions to include. Other improvements will involve focusing on certain classes of variants in the gene, such as those computationally predicted to be gain or loss of function, those with functional data from screenings, and those in transcripts expressed in tissues of interest 19,26,27 .…”
Section: Discussionmentioning
confidence: 99%
“…Coding variants were defined as those impacting protein coding transcript annotated as missense variant or predicted to have "high" impact. We also retrieved predicted gain or loss of function (GoLoF) variants from LoGoFunc [47], and linked non-coding variants to genes using activity-by-contact (ABC) maps [41]. ABC scores represent the contribution of an enhancer to the regulation of gene, measured by multiplying the estimates of enhancer activity and threedimensional contact frequencies between enhancers and promoters.…”
Section: Variant Annotationmentioning
confidence: 99%
“…We prioritized genes as putatively causal using a combination of evidence including MR, colocalization H4 posterior probabilities (PP) with molQTL, presence of an associated GoLoF variant [47] or other coding variants, distance to lead variant, and enhancer-promoter ABC scores [41]. Specifically, we ranked genes as follow: For a given locus, we then prioritized the best gene(s) as the one with the highest rank.…”
Section: Causal Gene Prioritizationmentioning
confidence: 99%
See 1 more Smart Citation