1999
DOI: 10.1613/jair.514
|View full text |Cite
|
Sign up to set email alerts
|

Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language

Abstract: This article presents a measure of semantic similarity i n a n is-a taxonomy based on the notion of shared information content. Experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge-counting approach. The article presents algorithms that take advantage of taxonomic similarity in resolving syntactic and semantic ambiguity, along with experimental results demonstrating their e ectiveness.

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

8
1,448
0
18

Year Published

2006
2006
2019
2019

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 1,681 publications
(1,474 citation statements)
references
References 54 publications
(50 reference statements)
8
1,448
0
18
Order By: Relevance
“…Automated approaches for representing the semantic content of terms and similarity and relatedness between them have been widely used in a number of Natural Language Processing (NLP) applications in both general English (Budanitsky and Hirst, 2006;Landauer, 2006;Resnik, 1999;Weeds and Weir, 2005) and specialized terminological domains such as bioinformatics (Ferreira et al, 2013;Lord et al, 2003;Mazandu et al, 2016;Wang et al, 2007;Yang et al, 2012) and medicine (Garla and Brandt, 2012;Lee et al, 2008;Liu et al, 2012;Pakhomov et al, 2010;Pedersen et al, 2007;Sajadi, 2014). A subset of these methods, distributional semantics, relies on the co-occurrence information between words obtained from large corpora of text and makes the assumption that words with similar or related meanings tend to occur in similar contexts.…”
Section: Introductionmentioning
confidence: 99%
“…Automated approaches for representing the semantic content of terms and similarity and relatedness between them have been widely used in a number of Natural Language Processing (NLP) applications in both general English (Budanitsky and Hirst, 2006;Landauer, 2006;Resnik, 1999;Weeds and Weir, 2005) and specialized terminological domains such as bioinformatics (Ferreira et al, 2013;Lord et al, 2003;Mazandu et al, 2016;Wang et al, 2007;Yang et al, 2012) and medicine (Garla and Brandt, 2012;Lee et al, 2008;Liu et al, 2012;Pakhomov et al, 2010;Pedersen et al, 2007;Sajadi, 2014). A subset of these methods, distributional semantics, relies on the co-occurrence information between words obtained from large corpora of text and makes the assumption that words with similar or related meanings tend to occur in similar contexts.…”
Section: Introductionmentioning
confidence: 99%
“…For every branch (biological process, molecular function and cellular component ) of GO taxonomy, a net is constructed by considering the GO terms the 8449 genes are annotated with, and by setting an edge between two genes if they share at least one annotation in the corresponding GO branch. The edge weight is the maximum Rensik semantic similarity [54] between all the terms for which the two genes are both annotated.…”
Section: Benchmark Datamentioning
confidence: 99%
“…Resnik : Resnik [54]. The similarity of two nodes is the information content of their minimum common ancestor:…”
Section: Disease Semantic Similaritiesmentioning
confidence: 99%
“…Among the earlier developed methods, an IC based measure called the Resnik measure has showed strong correlations between its results and gene expression similarities on yeast [16,22]. Mathematically, given a GO term t , its IC is defined as a negative log likelihood IC ( t ) = − log( |G t |/|G root | ), where G t and G root are the sets of genes annotated to term t and the root term (including all of its descendants) respectively.…”
Section: Introductionmentioning
confidence: 99%