Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 1 2017
DOI: 10.18653/v1/e17-1086
|View full text |Cite
|
Sign up to set email alerts
|

ShotgunWSD: An unsupervised algorithm for global word sense disambiguation inspired by DNA sequencing

Abstract: In this paper, we present a novel unsupervised algorithm for word sense disambiguation (WSD) at the document level. Our algorithm is inspired by a widely-used approach in the field of genetics for whole genome sequencing, known as the Shotgun sequencing technique. The proposed WSD algorithm is based on three main steps. First, a brute-force WSD algorithm is applied to short context windows (up to 10 words) selected from the document in order to generate a short list of likely sense configurations for each wind… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 8 publications
(16 citation statements)
references
References 30 publications
0
16
0
Order By: Relevance
“…A score is assigned to each sense configuration by computing the semantic relatedness between word senses (steps 16-19), as described by Patwardhan et al [33]. Butnaru et al [25] alternatively employed two measures to compute the semantic relatedness, one is the extended Lesk measure [31], [32] and the other is a simple approach based on deriving sense embeddings from word embeddings [36]. In this paper, we propose a third approach that is based on clustering word vectors with k-means and on eliminating the smaller clusters (which contain outlier words).…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…A score is assigned to each sense configuration by computing the semantic relatedness between word senses (steps 16-19), as described by Patwardhan et al [33]. Butnaru et al [25] alternatively employed two measures to compute the semantic relatedness, one is the extended Lesk measure [31], [32] and the other is a simple approach based on deriving sense embeddings from word embeddings [36]. In this paper, we propose a third approach that is based on clustering word vectors with k-means and on eliminating the smaller clusters (which contain outlier words).…”
Section: Methodsmentioning
confidence: 99%
“…In this paper, we present an improved version of a recently introduced WSD algorithm [25], termed ShotgunWSD, 1 which stems from the Shotgun genome sequencing technique [26], [27]. ShotgunWSD is unsupervised, but it also requires knowledge in the form of WordNet synsets and relations [28], [29].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Based on WordNet, we form the sense bag for a given synset by collecting the words found in the gloss of the synset (examples included) as well as the words found in the glosses of semantically related synsets. The semantic relations are chosen based on the part-ofspeech of the target word, as described in (Butnaru et al, 2017). To derive the sense embedding, we embed the collected words in an embedding space and compute the median of the resulted word vectors.…”
Section: Feature Extractionmentioning
confidence: 99%