Sequential model selection for word sense disambiguation

Pedersen, Ted; Brucet, Rebecca; Wiebe, Janyce

doi:10.3115/974557.974613

Cited by 8 publications

(5 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Decision tree, a well-known supervised machine learning algorithm, have been used by many researchers to carry out WSD in various languages and have performed better in a number of comparative studies [6][7] . In [6] the WSD model is implemented using bag-of-words based features while in the study described in [7] different features like part of speech of neighboring words, morphology and collocations are used to build the model.…”

Section: Literature Reviewmentioning

confidence: 99%

Towards Developing Word Sense Disambiguation System for Kashmiri Language

Mir,

Lawaye

2023

SMSJ

View full text Add to dashboard Cite

Background: A word, phrase, sentence or other communication is “ambiguous” if interpreted in multiple ways. The process of assigning the correct meaning to a word with respect to its context is known as Word Sense Disambiguation (WSD). WSD is intended to be a very imperious problem in Natural Language Processing (NLP) that requires proper attention as it impacts the performance of various NLP applications.Objectives: In this paper first attempt is made to propose a supervised machine learning Kashmiri WSD system.Material & Methods: The dataset comprising of 500K tokens for this research study has been collected from different resources. A sense annotated corpus for fifty commonly used ambiguous Kashmiri words has been created using the manual annotation method. Kashmiri WordNet is used to extract senses for the target words. Decision-tree based classifier is trained using the features extracted from annotated corpus for carrying out WSD task. We have used context widow of ±3 to extract features that are used to train the classifier.Results: The proposed system is tested on all fifty target words and evaluation is carried using accuracy, precision, recall and F-1 measures. The proposed system reported 81.831% accuracy, 0.834 precision,0.816 recall and 0.824 F1-measure.Conclusions: This was the initial step towards developing the WSD system for Kashmir and it has shown good results. In the future we expect to use other algorithms to carry out this task with greater language coverage

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Towards Developing Word Sense Disambiguation System for Kashmiri Language

Mir,

Lawaye

2023

SMSJ

View full text Add to dashboard Cite

show abstract

“…Multidimensional Scaling(MDS)技术对此矩阵进行降维操作,然后使用欧式距离计算向量端点之间的距离,最后根据语义距离进行词汇的聚类.Burgess 和 Lund 使用 Usenet newsgroup postings 词汇集进行实验发现,HAL 对于词汇类型和词性的辨别能力均非常准确.Lin 和 Pantel [54] 使用 Clustering by Committee(CBC)方法来发现聚另外,Pedersen 与 Bruce [55,56] 、Purandare 与 Pedersen [57] 也运用聚类算法实现了词义消岐,这里不再作介绍. Dagan [5] 在 1991 年指出,两种语言包含的信息比一种语言多,他在 1994 年 [58] 又探讨了使用第 2 种语言来帮助词义消歧的方法.Resnik 和 Yarowsky [59] 在一篇会议论文中正式推介基于双语语料的 WSD 方法.最近几年,公开发表的有关双语词义消歧的学术论文无论在数量上还是在质量上都有了较大的进步,例如,Escudero 等人 [60] 、Ide 等人 [61] 、Cong Li 等人 [62] 为双语语料在 WSD 研究上的应用起到了积极的推动作用.Ng 等人 [63] 把语言数据协会(linguistic data consortium,简称 LDC)提供的汉英双语语料应用到了词义消歧上,用 Naïve Bayes 模型构造词义分类器,测试了 SENSEVAL-2 中的 29 个名词,将平行语料的实验结果 P 与人工标注语料的结果 M 进行对比,P 基本超过或接近 M,说明平行语料在机器学习模型的训练上是比较有希望的.1999 年,Diab [64] 介绍了无指导的词义消歧系统 SALAAM.该系统自动生成 token-level 的对齐,能够同时自动生成英、德、法和西班牙语言的词义标注语料,因此为解决词义消歧的数据获取问题提供了多语言的解决框架;2003 年,Diab [65] 对 SALAAM 作了进一步的改进,认为改进后的 SALAAM 作为一个无指导的系统,在 SENSEVAL-2 英语全文词义消歧任务上的表现是当前最出色的;2004 年,Diab [66] 将该方法用于增强阿拉伯语词义消歧系统,这是在多语种扩展上的一个应用范例;同年,Diab [67] 使用 SALAAM 自动生成了规模较大的标注语料,然后用该训练语料来增强有指导的 WSD 系统.Bhattacharya 等人 [68] 充分利用了大型知识库 WordNet 的语义和概念体系来确定两个概率模型(分别是语义模型和概念模型)的结构,模型建立后,用通行的 EM 算法训练概率参数,实验结果表明, Bhattacharya 等人建立的语义模型在词义消歧上比 Diab 实现的 SALAAM 系统表现得更好,而概念模型又比语义模型要强很多.在国内,李涓子和黄昌宁 [23] 提出的基于转换的汉语词义消歧的无指导方法也具有一定的代表性.…”

Section: Wikipedia简介unclassified

Research on Unsupervised Word Sense Disambiguation

Wang¹,

Kong²

2009

Journal of Software

View full text Add to dashboard Cite

The goal of this paper is to give a brief summary of the current unsupervised word sense disambiguation techniques in order to facilitate future research. First of all, the significance of unsupervised word sense disambiguation study is introduced. Then, key techniques of various unsupervised word sense disambiguation studies at home and abroad are reviewed, including data sources, disambiguation methods, evaluation system and the achieved performance. Finally, 14 novel unsupervised word sense disambiguation methods are summarized, and the existing research and possible direction for the development of unsupervised word sense disambiguation study are pointed out.

show abstract

“…Ted. Pedersen [16] presented an experimental comparison of three unsupervised learning algorithms: McQuitty's similarity analysis, Ward's minimum-variance method, and the EM algorithm. Amruta Purandare [17] systematically compared unsupervised word sense discrimination techniques that cluster instances of a target word using both vector and similarity spaces.…”

Section: Introductionmentioning

confidence: 99%

Semi-Supervised Word Sense Disambiguation via Context Weighting

Zhao

Zuo

2014

AMR

View full text Add to dashboard Cite

Word sense disambiguation as a central research topic in natural language processing can promote the development of many applications such as information retrieval, speech synthesis, machine translation, summarization and question answering. Previous approaches can be grouped into three categories: supervised, unsupervised and knowledge-based. The accuracy of supervised methods is the highest, but they suffer from knowledge acquisition bottleneck. Unsupervised method can avoid knowledge acquisition bottleneck, but its effect is not satisfactory. With the built-up of large-scale knowledge, knowledge-based approach has attracted more and more attention. This paper introduces a new context weighting method, and based on which proposes a novel semi-supervised approach for word sense disambiguation. The significant contribution of our method is that thesaurus and machine learning techniques are integrated in word sense disambiguation. Compared with the state of the art on the test data of the English all words disambiguation task in Sensaval-3, our method yields obvious improvements over existing methods in nouns, adjectives and verbs disambiguation.

show abstract

Sequential model selection for word sense disambiguation

Cited by 8 publications

References 18 publications

Towards Developing Word Sense Disambiguation System for Kashmiri Language

Towards Developing Word Sense Disambiguation System for Kashmiri Language

Research on Unsupervised Word Sense Disambiguation

Semi-Supervised Word Sense Disambiguation via Context Weighting

Contact Info

Product

Resources

About