Word Sense Disambiguation using a Bidirectional LSTM

Kågebäck, Mikael; Salomonsson, Hans

doi:10.48550/arxiv.1606.03568

Cited by 9 publications

(13 citation statements)

References 7 publications

(8 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Bi-LSTM (Kågebäck and Salomonsson, 2016) is a baseline for neural models. Bi-LSTM +att.+LEX+P OS (Raganato et al, 2017a) is a multi-task learning framework for WSD, POS tagging, and LEX with self-attention mechanism, which converts WSD to a sequence learning task.…”

Section: Resultsmentioning

confidence: 99%

“…Recent neural-based methods are devoted to dealing with this problem. Kågebäck and Salomonsson (2016) present a supervised classifier based on Bi-LSTM, which shares parameters among all word types except the last layer. Raganato et al (2017a) convert WSD task to a sequence labeling task, thus building a unified model for all polysemous words.…”

Section: Traditionalmentioning

confidence: 99%

“…BERT(Token-CLS) Since every target in a sentence needs to be disambiguated to find its exact sense, WSD task can be regarded as a tokenlevel classification task. To incorporate BERT to WSD task, we take the final hidden state of the token corresponding to the target word (if more than one token, we average them) and add a classification layer for every target lemma, which is the same as the last layer of the Bi-LSTM model (Kågebäck and Salomonsson, 2016).…”

Section: Bertmentioning

confidence: 99%

See 2 more Smart Citations

GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge

Huang

Sun

Qiu

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

165

170

View full text Add to dashboard Cite

Word Sense Disambiguation (WSD) aims to find the exact sense of an ambiguous word in a particular context. Traditional supervised methods rarely take into consideration the lexical resources like WordNet, which are widely utilized in knowledge-based methods. Recent studies have shown the effectiveness of incorporating gloss (sense definition) into neural networks for WSD. However, compared with traditional word expert supervised methods, they have not achieved much improvement. In this paper, we focus on how to better leverage gloss knowledge in a supervised neural WSD system. We construct context-gloss pairs and propose three BERT-based models for WSD. We fine-tune the pre-trained BERT model and achieve new state-of-the-art results on WSD task 1 .

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Traditionalmentioning

confidence: 99%

Section: Bertmentioning

confidence: 99%

See 1 more Smart Citation

GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge

Huang

Sun

Qiu

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

165

170

View full text Add to dashboard Cite

show abstract

“…Drop-Tag (Kågebäck and Salomonsson, 2016) replaces token with a < dropped > tag. The tag is subsequently treated just like any other word in the vocabulary and has a corresponding word embedding that is trained.…”

Section: Token Drop Methodsmentioning

confidence: 99%

Token Drop mechanism for Neural Machine Translation

Zhang

Qiu

Duan

et al. 2020

Preprint

View full text Add to dashboard Cite

Neural machine translation with millions of parameters is vulnerable to unfamiliar inputs. We propose Token Drop to improve generalization and avoid overfitting for the NMT model. Similar to word dropout, whereas we replace dropped token with a special token instead of setting zero to words. We further introduce two self-supervised objectives: Replaced Token Detection and Dropped Token Prediction. Our method aims to force model generating target translation with less information, in this way the model can learn textual representation better. Experiments on Chinese-English and English-Romanian benchmark demonstrate the effectiveness of our approach and our model achieves significant improvements over a strong Transformer baseline 1 .

show abstract

“…Some advantages of using word embeddings is the lower dimensionality compared to bag-of-words and that words close in meaning are closer in the word em-bedding space. Very recent work still under preprint on using a special kind of recurrent network named LSTM (Long Short Term Memory) for WSD is recently being made available (Yuan et al, 2016) and with bidirectional LSTM (Kågebäck and Salomonsson, 2016), improving over more traditional supervised learning methods.…”

Section: Related Workmentioning

confidence: 99%

Word embeddings and recurrent neural networks based on Long-Short Term Memory nodes in supervised biomedical word sense disambiguation

Yepes¹

2016

Preprint

View full text Add to dashboard Cite

Word sense disambiguation helps identifying the proper sense of ambiguous words in text. With large terminologies such as the UMLS Metathesaurus ambiguities appear and highly effective disambiguation methods are required. Supervised learning algorithm methods are used as one of the approaches to perform disambiguation. Features extracted from the context of an ambiguous word are used to identify the proper sense of such a word. The type of features have an impact on machine learning methods, thus affect disambiguation performance. In this work, we have evaluated several types of features derived from the context of the ambiguous word and we have explored as well more global features derived from MEDLINE using word embeddings. Results show that word embeddings improve the performance of more traditional features and allow as well using recurrent neural network classifiers based on Long-Short Term Memory (LSTM) nodes. The combination of unigrams and word embeddings with an SVM sets a new state of the art performance with amacro accuracy of 95.97 in the MSH WSD data set.

show abstract

Word Sense Disambiguation using a Bidirectional LSTM

Cited by 9 publications

References 7 publications

GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge

GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge

Token Drop mechanism for Neural Machine Translation

Word embeddings and recurrent neural networks based on Long-Short Term Memory nodes in supervised biomedical word sense disambiguation

Contact Info

Product

Resources

About