2019
DOI: 10.1186/s12864-018-5370-x
|View full text |Cite
|
Sign up to set email alerts
|

Gene2vec: distributed representation of genes based on co-expression

Abstract: BackgroundExisting functional description of genes are categorical, discrete, and mostly through manual process. In this work, we explore the idea of gene embedding, distributed representation of genes, in the spirit of word embedding.ResultsFrom a pure data-driven fashion, we trained a 200-dimension vector representation of all human genes, using gene co-expression patterns in 984 data sets from the GEO databases. These vectors capture functional relatedness of genes in terms of recovering known pathways - th… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
62
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 112 publications
(78 citation statements)
references
References 13 publications
(12 reference statements)
1
62
0
Order By: Relevance
“…For feature extraction by PM, we chose Global Vectors for Word Representation (GloVe) by Pennington et al [25], where GloVe tend to embed targets more closely if they share common pathways more frequently. GloVe is based on word embedding methods, which were also applied to extract gene features using Gene Ontology or coexpression [33,36].…”
Section: Resultsmentioning
confidence: 99%
“…For feature extraction by PM, we chose Global Vectors for Word Representation (GloVe) by Pennington et al [25], where GloVe tend to embed targets more closely if they share common pathways more frequently. GloVe is based on word embedding methods, which were also applied to extract gene features using Gene Ontology or coexpression [33,36].…”
Section: Resultsmentioning
confidence: 99%
“…Also, several neural embedding methods have been recently proposed which can learn complex features for nodes of networks. Such methods have been applied to both PPI and co-expression networks and shown to be useful for reconstructing functional relationships [70,71]. However, there is still not enough evidence of whether such methods can outperform simple GBA methods, such as gene co-expression based on Pearson correlation, which is very effective for BPO predictions [12].…”
Section: Prediction Methodsmentioning
confidence: 99%
“…Much like NLP approaches can be used to represent and dene words by their sentence context, scientists are using NLP to represent and dene genomic elements, such as genes, by their genomic context. 10,17,18 Likewise NLP approaches are being applied to represent and dene molecules and fragments as new mathematical structures; an approach that provides new insights and analytical opportunities for understanding chemical relationships and activities. 19,20 When applied to natural product chemistry and genome mining, these and other DL and NLP approaches are providing new insights into natural product diversity, chemical properties, and therapeutic potential.…”
Section: Advances In Deep Learningmentioning
confidence: 99%
“…Many other vectorization methods are directly inuenced by (and in some cases are derivatives of) the revolutionary Word2Vec algorithm, although many have yet to be incorporated into the BGC space. 28 Biological interpretations of Word2Vec include DNA2Vec (DNA vectorization), 29 Gene2Vec (gene vectorization), 18 and ProtVec (protein and protein family vectorization). 30 While these latter vector-based representation approaches are not currently being leveraged for natural products, we expect them to be used more in coming years.…”
Section: Genome Annotationmentioning
confidence: 99%