Methods for Numeracy-Preserving Word Embeddings

Sundararaman, Dhanasekar; Si, Shijing; Subramanian, Vivek; Wang, Guoyin; Hazarika, Devamanyu; Carin, Lawrence

doi:10.18653/v1/2020.emnlp-main.384

Cited by 27 publications

(44 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…DICE Determinisitic Independent-of-Corpus Embeddings (Sundararaman et al, 2020) is an attempt to handcraft number encoder 3 f so as to preserve the relative magnitude between two numerals and their embeddings. Given two scalars i and j, and their embeddings f (i) and f (j), the cosine distance between f (i) and f (j) is intended to monotonically increase/decrease with the Euclidean distance between i and j.…”

Section: Real-based Methodsmentioning

confidence: 99%

“…Some progress has already been made in this transfer learning setup, e.g., GenBERT (Geva et al, 2020), finetuned on a synthetic dataset of arithmetic problems, is found to score higher on DROP QA. Similarly, DICE (Sundararaman et al, 2020), optimized for numeration, improves score on Numeracy600K order-of-magnitude prediction task. Going forward, we need several such studies, ideally for each pair of tasks to see whether some numeracy skills help models generalize to others.…”

Section: Vision For Unified Numeracy In Nlpmentioning

confidence: 99%

“…Direction: Some proposed methods are encoderonly, e.g., DICE (Sundararaman et al, 2020), while some can be decoder-only, e.g., those requiring sampling from a parameterized distribution (Spokoyny and Berg-Kirkpatrick, 2020).…”

Section: Real Basedmentioning

confidence: 99%

“…Model designers must also make a choice on coverage: whether to target a broad or a narrow range of numbers to be represented. Multi-class classification (Zhang et al, 2020) over a fixed number of bins, restricts the range of numbers expressed, as do DICE embeddings (Sundararaman et al, 2020). Value embeddings are continuous and theoretically unrestricted, but must practically be capped for bugfree training.…”

Section: Vision For Unified Numeracy In Nlpmentioning

confidence: 99%

See 3 more Smart Citations

Representing Numbers in NLP: a Survey and a Vision

Thawani¹,

Pujara²,

Ilievski³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

NLP systems rarely give special consideration to numbers found in text. This starkly contrasts with the consensus in neuroscience that, in the brain, numbers are represented differently from words. We arrange recent NLP work on numeracy into a comprehensive taxonomy of tasks and methods. We break down the subjective notion of numeracy into 7 subtasks, arranged along two dimensions: granularity (exact vs approximate) and units (abstract vs grounded). We analyze the myriad representational choices made by over a dozen previously published number encoders and decoders. We synthesize best practices for representing numbers in text and articulate a vision for holistic numeracy in NLP, comprised of design trade-offs and a unified evaluation.

show abstract

Section: Real-based Methodsmentioning

confidence: 99%

Section: Vision For Unified Numeracy In Nlpmentioning

confidence: 99%

Section: Real Basedmentioning

confidence: 99%

Section: Vision For Unified Numeracy In Nlpmentioning

confidence: 99%

See 2 more Smart Citations

Representing Numbers in NLP: a Survey and a Vision

Thawani¹,

Pujara²,

Ilievski³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

“…They find "prototype numbers" by clustering, and represent numbers as a weighted average of these prototypes. Sundararaman et al [21] proposed to learn embeddings for numbers, which reflect the distance of two numbers in the number line, independently from words. Table 6 summarizes these previous methods.…”

Section: Data Source Taskmentioning

confidence: 99%

Mining Numbers in Text: A Survey

Yoshida¹,

Kita²

2021

Information Systems - Intelligent Information Processing Systems, Natural Language Processing, Affective Computing and Artifici

View full text Add to dashboard Cite

Both words and numerals are tokens found in almost all documents but they have different properties. However, relatively little attention has been paid in numerals found in texts and many systems treated the numbers found in the document in ad-hoc ways, such as regarded them as mere strings in the same way as words, normalized them to zeros, or simply ignored them. Recent growth of natural language processing (NLP) research areas has change this situations and more and more attentions have been paid to the numeracy in documents. In this survey, we provide a quick overview of the history and recent advances of the research of mining such relations between numerals and words found in text data.

show abstract

Arithmetic with language models: From memorization to computation

Maltoni,

Ferrara

2024

Neural Networks

View full text Add to dashboard Cite

Methods for Numeracy-Preserving Word Embeddings

Cited by 27 publications

References 19 publications

Representing Numbers in NLP: a Survey and a Vision

Representing Numbers in NLP: a Survey and a Vision

Mining Numbers in Text: A Survey

Arithmetic with language models: From memorization to computation

Contact Info

Product

Resources

About