2007
DOI: 10.1016/j.jbi.2006.06.004
|View full text |Cite
|
Sign up to set email alerts
|

Measures of semantic similarity and relatedness in the biomedical domain

Abstract: Measures of semantic similarity between concepts are widely used in Natural Language Processing. In this article, we show how six existing domain-independent measures can be adapted to the biomedical domain. These measures were originally based on WordNet, an English lexical database of concepts and relations. In this research, we adapt these measures to the SNOMED-CT ontology of medical concepts. The measures include two path-based measures, and three measures that augment path-based measures with information… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

5
396
0

Year Published

2010
2010
2017
2017

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 440 publications
(401 citation statements)
references
References 20 publications
5
396
0
Order By: Relevance
“…In the current study, the performance on both similarity and relatedness benchmarks plateaued between 10M and 100M tokens, which is consistent with the findings of another previous study by Pedersen et al (2007), in which we found that the performance of another distributional semantics corpus-based approach to computing semantic relatedness plateaued after the training corpus reached 300 000 clinical notes ($ 66M tokens). These findings may be interpreted as providing additional evidence to show that the size of the corpus used for distributional semantic representations of medical terms does not matter beyond a certain point (e.g.…”
Section: Discussionsupporting
confidence: 92%
See 2 more Smart Citations
“…In the current study, the performance on both similarity and relatedness benchmarks plateaued between 10M and 100M tokens, which is consistent with the findings of another previous study by Pedersen et al (2007), in which we found that the performance of another distributional semantics corpus-based approach to computing semantic relatedness plateaued after the training corpus reached 300 000 clinical notes ($ 66M tokens). These findings may be interpreted as providing additional evidence to show that the size of the corpus used for distributional semantic representations of medical terms does not matter beyond a certain point (e.g.…”
Section: Discussionsupporting
confidence: 92%
“…Automated approaches for representing the semantic content of terms and similarity and relatedness between them have been widely used in a number of Natural Language Processing (NLP) applications in both general English (Budanitsky and Hirst, 2006;Landauer, 2006;Resnik, 1999;Weeds and Weir, 2005) and specialized terminological domains such as bioinformatics (Ferreira et al, 2013;Lord et al, 2003;Mazandu et al, 2016;Wang et al, 2007;Yang et al, 2012) and medicine (Garla and Brandt, 2012;Lee et al, 2008;Liu et al, 2012;Pakhomov et al, 2010;Pedersen et al, 2007;Sajadi, 2014). A subset of these methods, distributional semantics, relies on the co-occurrence information between words obtained from large corpora of text and makes the assumption that words with similar or related meanings tend to occur in similar contexts.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Pedersen et al [1] Proposed semantic similarity and relatedness in the biomedicine domain, by applied a corpus-based context vector approach to measuring thesimilarity between concepts in SNOMED-CT. Their context vector approach is ontology-free but requires training text, for which, they used text data from Mayo Clinic corpus of medical notes.…”
Section: Pedersen Measurementioning
confidence: 99%
“…Word Net, UMLS/ICD10) to calculate the distance between the How to cite this paper: Althobaiti, A.F.S. (2017) Comparison of Ontology-Based Semantic-Similarity Measures in the Biome-concept nodes in the ontology tree or hierarchy [1]. The second class of techniques uses training corpora and information content (IC) to estimate the semantic similarity and relatedness between two concepts.…”
Section: Introductionmentioning
confidence: 99%