Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.384
|View full text |Cite
|
Sign up to set email alerts
|

Methods for Numeracy-Preserving Word Embeddings

Abstract: Word embedding models are typically able to capture the semantics of words via the distributional hypothesis, but fail to capture the numerical properties of numbers that appear in a text. This leads to problems with numerical reasoning involving tasks such as question answering. We propose a new methodology to assign and learn embeddings for numbers. Our approach creates Deterministic, Independentof-Corpus Embeddings (referred to as DICE) for numbers, such that their cosine similarity reflects the actual dist… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
37
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 27 publications
(44 citation statements)
references
References 19 publications
0
37
0
Order By: Relevance
“…DICE Determinisitic Independent-of-Corpus Embeddings (Sundararaman et al, 2020) is an attempt to handcraft number encoder 3 f so as to preserve the relative magnitude between two numerals and their embeddings. Given two scalars i and j, and their embeddings f (i) and f (j), the cosine distance between f (i) and f (j) is intended to monotonically increase/decrease with the Euclidean distance between i and j.…”
Section: Real-based Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…DICE Determinisitic Independent-of-Corpus Embeddings (Sundararaman et al, 2020) is an attempt to handcraft number encoder 3 f so as to preserve the relative magnitude between two numerals and their embeddings. Given two scalars i and j, and their embeddings f (i) and f (j), the cosine distance between f (i) and f (j) is intended to monotonically increase/decrease with the Euclidean distance between i and j.…”
Section: Real-based Methodsmentioning
confidence: 99%
“…Some progress has already been made in this transfer learning setup, e.g., GenBERT (Geva et al, 2020), finetuned on a synthetic dataset of arithmetic problems, is found to score higher on DROP QA. Similarly, DICE (Sundararaman et al, 2020), optimized for numeration, improves score on Numeracy600K order-of-magnitude prediction task. Going forward, we need several such studies, ideally for each pair of tasks to see whether some numeracy skills help models generalize to others.…”
Section: Vision For Unified Numeracy In Nlpmentioning
confidence: 99%
See 2 more Smart Citations
“…They find "prototype numbers" by clustering, and represent numbers as a weighted average of these prototypes. Sundararaman et al [21] proposed to learn embeddings for numbers, which reflect the distance of two numbers in the number line, independently from words. Table 6 summarizes these previous methods.…”
Section: Data Source Taskmentioning
confidence: 99%