Exploring Numeracy in Word Embeddings

Naik, Aakanksha; Ravichander, Abhilasha; Rosé, Carolyn Penstein; Hovy, Eduard

doi:10.18653/v1/p19-1329

Cited by 66 publications

(83 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…String Embeddings Recently, word and token embeddings have been analyzed to see if they record numerical properties (for example, magnitude or sorting order) (Wallace et al, 2019;Naik et al, 2019). This work finds evidence that common embedding approaches are unable to generalize to large numeric ranges, but that characterbased embeddings fare better than the rest.…”

Section: Related Workmentioning

confidence: 99%

An Empirical Investigation of Contextualized Number Prediction

Berg-Kirkpatrick¹,

Spokoyny²

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

We conduct a large scale empirical investigation of contextualized number prediction in running text. Specifically, we consider two tasks: (1) masked number prediction -predicting a missing numerical value within a sentence, and (2) numerical anomaly detectiondetecting an errorful numeric value within a sentence. We experiment with novel combinations of contextual encoders and output distributions over the real number line. Specifically, we introduce a suite of output distribution parameterizations that incorporate latent variables to add expressivity and better fit the natural distribution of numeric values in running text, and combine them with both recurrent and transformer-based encoder architectures. We evaluate these models on two numeric datasets in the financial and scientific domain. Our findings show that output distributions that incorporate discrete latent variables and allow for multiple modes outperform simple flow-based counterparts on all datasets, yielding more accurate numerical prediction and anomaly detection. We also show that our models effectively utilize textual context and benefit from general-purpose unsupervised pretraining. 1

show abstract

Section: Related Workmentioning

confidence: 99%

An Empirical Investigation of Contextualized Number Prediction

Berg-Kirkpatrick¹,

Spokoyny²

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…Training predict models on very large corpora has therefore become the de-facto standard approach in distributional semantics research for representing word meaning (Chersoni et al, 2020;Moreo et al, 2019;Naik et al, 2019).…”

Section: Approaches To Modelling Linguistic Distributional Knowledgementioning

confidence: 99%

“…As a result, distributional semantics and linguistic-simulation research have developed some different theoretical assumptions on how linguistic distributional knowledge should be modelled. The predominant view in distributional semantics research is based on a tacit "onesize-fits-all" assumption for how distributional information should best fit human data: predict models trained on very large (and noisy) corpora are the de facto standard for forming distributional word representations, regardless of the semantic task being modelled (e.g., Baroni et al, 2014;Naik et al, 2019). The implication of this assumption is that there exists an optimal LDM that is appropriate for modelling all forms of linguistic distributional knowledge in cognition.…”

Section: Approaches To Modelling Linguistic Distributional Knowledgementioning

confidence: 99%

Understanding the role of linguistic distributional knowledge in cognition

Wingfield¹,

Connell²

2019

Preprint

View full text Add to dashboard Cite

The distributional patterns of words in language forms the basis of linguistic distributional knowledge and contributes to conceptual processing across cognition. While corpus-based linguistic distributional models (LDMs) can capture human performance in many cognitive tasks, questions remain regarding the nature and role of linguistic distributional knowledge in cognition. We propose that LDMs can be a cognitively plausible approach to modelling linguistic distributional knowledge when assumed to represent an essential component of semantics that is grounded in a complementary sensorimotor component, when trained on appropriate corpora that are representative of human language experience, and when they capture syntagmatic, paradigmatic, and bag-of-words semantic relations that are useful to cognition. Using an extensive set of cognitive tasks that vary in their conceptual processing demands and response measurements, we systematically evaluate a wide range of model families (predict vector, count vector, n-gram), corpora varying in size and quality, and parameter settings. Our findings demonstrate that there is no one-size-fits-all approach for how linguistic distributional knowledge is used across cognition, and that its use depends on the conceptual complexity of the task at hand. Conceptually simple tasks that rely on paradigmatic relations are relatively easy to model even with poor-quality language experience, but conceptually complex tasks that involve sophisticated processing of multiple relations (paradigmatic, syntagmatic, and bag-of-words) require a diverse set of task-specific models and high-quality language experience. Linguistic distributional knowledge is a rich source of information about the world that can be accessed flexibly according to cognitive need. Online materials are available at https://osf.io/uj92m/.

show abstract

“…Analysis of word embeddings and the structure of the learned feature space often reveals interesting language properties and is an important research direction (Köhn, 2015;Bolukbasi et al, 2016;Mimno and Thompson, 2017;Nakashole and Flauger, 2018;Naik et al, 2019;Ethayarajh et al, 2019). We show that graph-based embeddings can be a powerful tool for language analysis.…”

Section: Related Workmentioning

confidence: 77%

Embedding Words in Non-Vector Space with Unsupervised Graph Learning

Ryabinin

Попов²,

Prokhorenkova

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

It has become a de-facto standard to represent words as elements of a vector space (word2vec, GloVe). While this approach is convenient, it is unnatural for language: words form a graph with a latent hierarchical structure, and this structure has to be revealed and encoded by word embeddings. We introduce Graph-Glove: unsupervised graph word representations which are learned end-to-end. In our setting, each word is a node in a weighted graph and the distance between words is the shortest path distance between the corresponding nodes. We adopt a recent method learning a representation of data in the form of a differentiable weighted graph and use it to modify the GloVe training algorithm. We show that our graph-based representations substantially outperform vector-based methods on word similarity and analogy tasks. Our analysis reveals that the structure of the learned graphs is hierarchical and similar to that of WordNet, the geometry is highly non-trivial and contains subgraphs with different local topology. 1

show abstract

Exploring Numeracy in Word Embeddings

Cited by 66 publications

References 23 publications

An Empirical Investigation of Contextualized Number Prediction

An Empirical Investigation of Contextualized Number Prediction

Understanding the role of linguistic distributional knowledge in cognition

Embedding Words in Non-Vector Space with Unsupervised Graph Learning

Contact Info

Product

Resources

About