2019
DOI: 10.1089/cmb.2018.0093
|View full text |Cite
|
Sign up to set email alerts
|

Word and Sentence Embedding Tools to Measure Semantic Similarity of Gene Ontology Terms by Their Definitions

Abstract: The gene ontology (GO) database contains GO terms that describe biological functions of genes. Previous methods for comparing GO terms have relied on the fact that GO terms are organized into a tree structure. Under this paradigm, the locations of two GO terms in the tree dictate their similarity score. In this article, we introduce two new solutions for this problem by focusing instead on the definitions of the GO terms. We apply neural networkbased techniques from the natural language processing (NLP) domain… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
23
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 17 publications
(23 citation statements)
references
References 23 publications
0
23
0
Order By: Relevance
“…This paper will focus on the second data resource, the Gene Ontology itself. On this end, there have been efforts in developing distance metric for GO terms [7,12,14,16,18,21,28]. Most traditional methods for computing semantic similarity of GO terms rely on the Information Content (IC) and the GO tree.…”
Section: Introductionmentioning
confidence: 99%
See 4 more Smart Citations
“…This paper will focus on the second data resource, the Gene Ontology itself. On this end, there have been efforts in developing distance metric for GO terms [7,12,14,16,18,21,28]. Most traditional methods for computing semantic similarity of GO terms rely on the Information Content (IC) and the GO tree.…”
Section: Introductionmentioning
confidence: 99%
“…Methods based on shared ancestors and IC values have two drawbacks. First, they do not consider the definitions of the GO terms which have been shown to yield better semantic similarity scores in many cases [7,14]. Second, they are unable to create vector representations of GO terms which then can be integrated into other annotation models to predict functions for protein sequences.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations