2023
DOI: 10.1093/nargab/lqad095
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of input data modality choices on functional gene embeddings

Felix Brechtmann,
Thibault Bechtler,
Shubhankar Londhe
et al.

Abstract: Functional gene embeddings, numerical vectors capturing gene function, provide a promising way to integrate functional gene information into machine learning models. These embeddings are learnt by applying self-supervised machine-learning algorithms on various data types including quantitative omics measurements, protein–protein interaction networks and literature. However, downstream evaluations comparing alternative data modalities used to construct functional gene embeddings have been lacking. Here we bench… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(8 citation statements)
references
References 56 publications
0
2
0
Order By: Relevance
“…We also explored models that include complementary features from external datasets. Specifically, we incorporated co-essentiality modules from DepMap [ 59 ] and a 256-dimensional functional gene embedding that integrates protein–protein interactions (PPI), genome-wide deletion screen results from the DepMap project, co-expression from bulk RNA-seq and single-cell RNA-seq compendia [ 60 ] (see Methods). This resulted in an enhancement of overall performance (average precision increased from 23 to 36%, Figure S 22 , Table S 11 and S 12 ).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We also explored models that include complementary features from external datasets. Specifically, we incorporated co-essentiality modules from DepMap [ 59 ] and a 256-dimensional functional gene embedding that integrates protein–protein interactions (PPI), genome-wide deletion screen results from the DepMap project, co-expression from bulk RNA-seq and single-cell RNA-seq compendia [ 60 ] (see Methods). This resulted in an enhancement of overall performance (average precision increased from 23 to 36%, Figure S 22 , Table S 11 and S 12 ).…”
Section: Resultsmentioning
confidence: 99%
“…The gene-level features consisted of 21 features from seven gene-level metrics of seven IntOGen tools, nine features from AbSplice-DNA scores, 22 features from OUTRIDER obtained using combinations of fold-change direction, significance, and effect size cutoffs, and, similarly, 11 features from NB-act and 22 features from FRASER ( Supplementary Materials and Methods ). For those models integrating external gene functional data, co-essential modules from DepMap [ 59 ] and 256-dimensional functional gene embeddings [ 60 ] were further included as features. In total, 377 genes listed among the hematologic panel genes ( Supplementary Materials and Methods ) or the hematologic malignancy driver genes from CGC GRCh37 v97 [ 54 ] were used as the positive class for the classifiers.…”
Section: Methodsmentioning
confidence: 99%
“…For each sample, the covariates age (UKBB data field 21003), age 2 , sex (UKBB data field 31), age:sex, age²:sex, and the first 20 genetic principal components (PCs) (UKBB data field 22009_0. [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]) were obtained directly from the UK Biobank. All of these were used as covariates for the burden test and for FuncRVP.…”
Section: Covariates and Polygenic Risk Scores (Prs)mentioning
confidence: 99%
“…Moreover, FuncRVP led to more robust gene effect estimates and to increased gene discoveries notably among genes that are more genetically constrained. Altogether, these results demonstrate that the integration of functional information across genes improves rare-variant phenotype prediction and gene discovery.Recently, functional gene embeddings, numerical vectors capturing gene function such that genes with similar functions are close in the embedding space, have been proposed as an alternative to gene sets to represent gene functions [18][19][20] . Since they are vector representations rather than sets, functional gene embeddings can retain some quantitative information about functional similarities and are more straightforwardly integrated into machine learning algorithms.…”
mentioning
confidence: 99%
See 1 more Smart Citation