Proceedings of the Nineteenth Conference on Computational Natural Language Learning 2015
DOI: 10.18653/v1/k15-1037
|View full text |Cite
|
Sign up to set email alerts
|

One Million Sense-Tagged Instances for Word Sense Disambiguation and Induction

Abstract: Supervised word sense disambiguation (WSD) systems are usually the best performing systems when evaluated on standard benchmarks. However, these systems need annotated training data to function properly. While there are some publicly available open source WSD systems, very few large annotated datasets are available to the research community. The two main goals of this paper are to extract and annotate a large number of samples and release them for public use, and also to evaluate this dataset against some word… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
63
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 61 publications
(63 citation statements)
references
References 14 publications
0
63
0
Order By: Relevance
“…These approaches have shown the potential of using word embeddings on the WSD task. Iacobacci et al (2016) carried 7 As already noted by Taghipour and Ng (2015a), supervised systems trained on only OMSTI obtain lower results than when trained along with SemCor, mainly due to OM-STI's lack of coverage in target word types. 8 We used the original implementation available at http: //www.comp.nus.edu.sg/˜nlp/software.html out a comparison of different strategies for integrating word embeddings as a feature in WSD.…”
Section: Supervisedmentioning
confidence: 94%
See 1 more Smart Citation
“…These approaches have shown the potential of using word embeddings on the WSD task. Iacobacci et al (2016) carried 7 As already noted by Taghipour and Ng (2015a), supervised systems trained on only OMSTI obtain lower results than when trained along with SemCor, mainly due to OM-STI's lack of coverage in target word types. 8 We used the original implementation available at http: //www.comp.nus.edu.sg/˜nlp/software.html out a comparison of different strategies for integrating word embeddings as a feature in WSD.…”
Section: Supervisedmentioning
confidence: 94%
“…• OMSTI (Taghipour and Ng, 2015a OMSTI 5 has already shown its potential as a training corpus by improving the performance of supervised systems which add it to existing training data (Taghipour and Ng, 2015a;Iacobacci et al, 2016). Table 1 shows some statistics 6 of the WSD datasets and training corpora which we use in the evaluation framework.…”
Section: Sense-annotated Training Corporamentioning
confidence: 99%
“…SemCor (Miller et al, 1993) and OMSTI (Taghipour and Ng, 2015a). As sense inventory, we used WordNet 3.0 (Miller et al, 1990) for all open-class parts of speech.…”
Section: Discussionmentioning
confidence: 99%
“…• One Million Sense-Tagged Instances (Taghipour and Ng, 2015, OMSTI), a sense-annotated dataset obtained via a semi-automatic approach based on the disambiguation of a parallel corpus, i.e., the United Nations Parallel Corpus, performed by exploiting manually translated word senses. Because OMSTI integrates SemCor to increase coverage, to keep a level playing field we excluded the latter from the corpus.…”
Section: Semantic Networkmentioning
confidence: 99%