Findings of the Association for Computational Linguistics: EMNLP 2020 2020
DOI: 10.18653/v1/2020.findings-emnlp.434
|View full text |Cite
|
Sign up to set email alerts
|

Robust Backed-off Estimation of Out-of-Vocabulary Embeddings

Abstract: Out-of-vocabulary (OOV) words cause serious troubles in solving natural language tasks with a neural network. Existing approaches to this problem resort to using subwords, which are shorter and more ambiguous units than words, in order to represent OOV words with a bag of subwords. In this study, inspired by the processes for creating words from known words, we propose a robust method of estimating OOV word embeddings by referring to pre-trained word embeddings for known words with similar surfaces to target O… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 35 publications
(55 reference statements)
0
8
0
Order By: Relevance
“…2) Since microblog posts are short and noisy, we practically need more than one post for typing. In fact, the accuracy of Twitter NER is very low (29.7%) for out-of-vocabulary entities (Fukuda et al, 2020).…”
Section: Task Settingsmentioning
confidence: 96%
“…2) Since microblog posts are short and noisy, we practically need more than one post for typing. In fact, the accuracy of Twitter NER is very low (29.7%) for out-of-vocabulary entities (Fukuda et al, 2020).…”
Section: Task Settingsmentioning
confidence: 96%
“…Embedding Generator Our work is also related to studies with respect to generating embeddings for out-of-vocabulary (OOV) words. In this context, researchers use embeddings of characters or subwords to predict those of unseen words (Pinter et al, 2017;Sasaki et al, 2019;Fukuda et al, 2020). For example, train an embedding generator through reconstructing the original representation of each word from its bag of subwords.…”
Section: Related Workmentioning
confidence: 99%
“…Sasaki et al (2019) progressively improve the generator using attention mechanism. Fukuda et al (2020) further leverage similar words to enhance this procedure. Our work significantly differs from the above studies in two aspects.…”
Section: Related Workmentioning
confidence: 99%
“…In [23], an iterative mimicking framework that strikes a good balance between word-level and character-level representations of words was proposed to better capture the syntactic and semantic similarities. In [24], a method was proposed to estimate OOVs' embeddings by referring to pre-trained word embeddings for known words with similar surfaces to target OOVs. In [25], the embeddings of OOVs were determined by the spelling and the contexts in which they appear.…”
Section: Related Workmentioning
confidence: 99%