One Million Sense-Tagged Instances for Word Sense Disambiguation and Induction

Taghipour, Kaveh; Ng, Hwee Tou

doi:10.18653/v1/k15-1037

Cited by 61 publications

(63 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These approaches have shown the potential of using word embeddings on the WSD task. Iacobacci et al (2016) carried 7 As already noted by Taghipour and Ng (2015a), supervised systems trained on only OMSTI obtain lower results than when trained along with SemCor, mainly due to OM-STI's lack of coverage in target word types. 8 We used the original implementation available at http: //www.comp.nus.edu.sg/˜nlp/software.html out a comparison of different strategies for integrating word embeddings as a feature in WSD.…”

Section: Supervisedmentioning

confidence: 94%

“…• OMSTI (Taghipour and Ng, 2015a OMSTI 5 has already shown its potential as a training corpus by improving the performance of supervised systems which add it to existing training data (Taghipour and Ng, 2015a;Iacobacci et al, 2016). Table 1 shows some statistics 6 of the WSD datasets and training corpora which we use in the evaluation framework.…”

Section: Sense-annotated Training Corporamentioning

confidence: 99%

See 1 more Smart Citation

Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison

Raganato

Camacho-Collados²,

Navigli³

2017

Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 1

249

307

View full text Add to dashboard Cite

Word Sense Disambiguation is a longstanding task in Natural Language Processing, lying at the core of human language understanding. However, the evaluation of automatic systems has been problematic, mainly due to the lack of a reliable evaluation framework. In this paper we develop a unified evaluation framework and analyze the performance of various Word Sense Disambiguation systems in a fair setup. The results show that supervised systems clearly outperform knowledge-based models. Among the supervised systems, a linear classifier trained on conventional local features still proves to be a hard baseline to beat. Nonetheless, recent approaches exploiting neural networks on unlabeled corpora achieve promising results, surpassing this hard baseline in most test sets.

show abstract

Section: Supervisedmentioning

confidence: 94%

Section: Sense-annotated Training Corporamentioning

confidence: 99%

Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison

Raganato

Camacho-Collados²,

Navigli³

2017

Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 1

249

307

View full text Add to dashboard Cite

show abstract

“…SemCor (Miller et al, 1993) and OMSTI (Taghipour and Ng, 2015a). As sense inventory, we used WordNet 3.0 (Miller et al, 1990) for all open-class parts of speech.…”

Section: Discussionmentioning

confidence: 99%

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Specia

Post

Paul

2017

View full text Add to dashboard Cite

“…• One Million Sense-Tagged Instances (Taghipour and Ng, 2015, OMSTI), a sense-annotated dataset obtained via a semi-automatic approach based on the disambiguation of a parallel corpus, i.e., the United Nations Parallel Corpus, performed by exploiting manually translated word senses. Because OMSTI integrates SemCor to increase coverage, to keep a level playing field we excluded the latter from the corpus.…”

Section: Semantic Networkmentioning

confidence: 99%

Train-O-Matic: Large-Scale Supervised Word Sense Disambiguation in Multiple Languages without Manual Training Data

Pasini

Navigli

2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Annotating large numbers of sentences with senses is the heaviest requirement of current Word Sense Disambiguation. We present Train-O-Matic, a languageindependent method for generating millions of sense-annotated training instances for virtually all meanings of words in a language's vocabulary. The approach is fully automatic: no human intervention is required and the only type of human knowledge used is a WordNet-like resource. Train-O-Matic achieves consistently state-of-the-art performance across gold standard datasets and languages, while at the same time removing the burden of manual annotation. All the training data is available for research purposes at http://trainomatic.org.

show abstract

One Million Sense-Tagged Instances for Word Sense Disambiguation and Induction

Cited by 61 publications

References 14 publications

Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison

Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Train-O-Matic: Large-Scale Supervised Word Sense Disambiguation in Multiple Languages without Manual Training Data

Contact Info

Product

Resources

About