Effects of information and machine learning algorithms on word sense disambiguation with small datasets

Leroy, Gondy; Rindflesch, Thomas C.

doi:10.1016/j.ijmedinf.2005.03.013

Cited by 43 publications

(40 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This has already been shown in the literature [14-16]. The MFS indicates that usually one sense of the ambiguous word is highly represented compared to the rest of the senses.…”

Section: Resultssupporting

confidence: 65%

“…Among the knowledge-based methods we find the Journal Descriptor Indexing method [12] and several based on graph algorithms [13]. Machine learning algorithms have been explored in several studies where alternative combinations of features are compared [14-16]; these studies obtain a performance of over 0.86 in terms of accuracy using the collection prepared by Weeber et al [17]. …”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Knowledge-based biomedical word sense disambiguation: comparison of approaches

Jimeno-Yepes

Aronson

2010

BMC Bioinformatics

View full text Add to dashboard Cite

BackgroundWord sense disambiguation (WSD) algorithms attempt to select the proper sense of ambiguous terms in text. Resources like the UMLS provide a reference thesaurus to be used to annotate the biomedical literature. Statistical learning approaches have produced good results, but the size of the UMLS makes the production of training data infeasible to cover all the domain.MethodsWe present research on existing WSD approaches based on knowledge bases, which complement the studies performed on statistical learning. We compare four approaches which rely on the UMLS Metathesaurus as the source of knowledge. The first approach compares the overlap of the context of the ambiguous word to the candidate senses based on a representation built out of the definitions, synonyms and related terms. The second approach collects training data for each of the candidate senses to perform WSD based on queries built using monosemous synonyms and related terms. These queries are used to retrieve MEDLINE citations. Then, a machine learning approach is trained on this corpus. The third approach is a graph-based method which exploits the structure of the Metathesaurus network of relations to perform unsupervised WSD. This approach ranks nodes in the graph according to their relative structural importance. The last approach uses the semantic types assigned to the concepts in the Metathesaurus to perform WSD. The context of the ambiguous word and semantic types of the candidate concepts are mapped to Journal Descriptors. These mappings are compared to decide among the candidate concepts. Results are provided estimating accuracy of the different methods on the WSD test collection available from the NLM.ConclusionsWe have found that the last approach achieves better results compared to the other methods. The graph-based approach, using the structure of the Metathesaurus network to estimate the relevance of the Metathesaurus concepts, does not perform well compared to the first two methods. In addition, the combination of methods improves the performance over the individual approaches. On the other hand, the performance is still below statistical learning trained on manually produced data and below the maximum frequency sense baseline. Finally, we propose several directions to improve the existing methods and to improve the Metathesaurus to be more effective in WSD.

show abstract

“…This has already been shown in the literature [14-16]. The MFS indicates that usually one sense of the ambiguous word is highly represented compared to the rest of the senses.…”

Section: Resultssupporting

confidence: 65%

Section: Introductionmentioning

confidence: 99%

Knowledge-based biomedical word sense disambiguation: comparison of approaches

Jimeno-Yepes

Aronson

2010

BMC Bioinformatics

View full text Add to dashboard Cite

show abstract

“…We noticed such inconsistencies in prior work by us [30] and by others [19]. In the present testbed, we encountered a few omissions i.e., annotations that were identified by our parser but not by the human annotators.…”

Section: Parser Evaluationmentioning

confidence: 59%

Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application (Preprint)

Leroy¹,

Gu²,

Pettygrove³

et al. 2018

Preprint

Self Cite

View full text Add to dashboard Cite

Background: Electronic health records (EHR) bring many opportunities for information utilization. One such use is the surveillance conducted by the Centers for Disease Control and Prevention (CDC) to track cases of Autism Spectrum Disorders (ASD). This process currently comprises the manual collection and review of EHR of 4-and 8-year old children in 11 U.S. states for the presence of ASD criteria. The work is time-consuming and expensive.

show abstract

“…In order to compare our method with other methods in verifying the efficiency of extracting coordinate relationship, we adopt the following two kinds of methods, Naïve Bayesian (NB) [20][21][22][23] and Support Vector Machine (SVM) [24][25][26][27]. The detailed descriptions for these two methods are as follows.…”

Section: B Comparison Methodsmentioning

confidence: 99%

Coordinate relationship extraction on sentence level in Chinese corpus

Sun

Liu

Zhou

2013

2013 Ninth International Conference on Natural Computation (ICNC)

View full text Add to dashboard Cite

An important problem in text mining is the automatic extraction of relations. The study of extracting semantic relation between event semantic chunks from sentences is rarely involved in the current research. This paper presents a metric theory on identifying semantic relation based on matching degree of the deictic word and the similarity of event semantic chunk, and discusses the calculation of the similarities of concept and event semantic tree. Finally, a novel method based on ontology is proposed to extract coordinate relation from sentences. This method explores the effect of extracting coordinate relation for each kind of deictic word of coordinate relation and none of deictic word. This paper provides a contrast with the experimental comparison between Ontology-based, Naïve Bayesian (NB) and Support Vector Machine (SVM). The overall performance of Ontology-based method is tested by selecting randomly 100 sentences of coordinate relation and 600 sentences of non-coordinate relation from data corpus. This Experimental datashows that the method is effective.

show abstract

Effects of information and machine learning algorithms on word sense disambiguation with small datasets

Cited by 43 publications

References 13 publications

Knowledge-based biomedical word sense disambiguation: comparison of approaches

Knowledge-based biomedical word sense disambiguation: comparison of approaches

Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application (Preprint)

Coordinate relationship extraction on sentence level in Chinese corpus

Contact Info

Product

Resources

About