Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.414
|View full text |Cite
|
Sign up to set email alerts
|

Language Model Evaluation Beyond Perplexity

Abstract: We propose an alternate approach to quantifying how well language models learn natural language: we ask how well they match the statistical tendencies of natural language.To answer this question, we analyze whether text generated from language models exhibits the statistical tendencies present in the humangenerated text on which they were trained. We provide a framework-paired with significance tests-for evaluating the fit of language models to these trends. We find that neural language models appear to learn … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 25 publications
(15 citation statements)
references
References 34 publications
0
13
0
Order By: Relevance
“…As neural networks yield stateof-the-art performance in language modeling tasks, we expect them to also do well with the unigram distribution. In fact, pseudo-text generated by LSTM-based language models reproduces Zipf's law to some extent (Takahashi and Tanaka-Ishii, 2017;Meister and Cotterell, 2021). Thus, we view state-of-the-art LSTM models as a strong baseline.…”
Section: Modeling the Unigram Distributionmentioning
confidence: 99%
“…As neural networks yield stateof-the-art performance in language modeling tasks, we expect them to also do well with the unigram distribution. In fact, pseudo-text generated by LSTM-based language models reproduces Zipf's law to some extent (Takahashi and Tanaka-Ishii, 2017;Meister and Cotterell, 2021). Thus, we view state-of-the-art LSTM models as a strong baseline.…”
Section: Modeling the Unigram Distributionmentioning
confidence: 99%
“…In this section, we examine the extent to which higher-order statistics of sentences from BERT's prior are well-calibrated to the data it was trained on. This kind of comparison provides a richer sense of what the model has learned or failed to learn than traditional scalar metrics like perplexity (Takahashi and Tanaka-Ishii, 2017; Meister and Cotterell, 2021;Takahashi and Tanaka-Ishii, 2019).…”
Section: Distributional Comparisonsmentioning
confidence: 99%
“…ing frameworks (Meister & Cotterell, 2021) to better understand whether the large-scale statistical tendencies of natural language, such as Zipf's law (Zipf, 1949), are captured by LMs. We take a more fine-grained approach, proposing a methodology which draws off of instance-level evaluation schemes (Zhong et al, 2021) and the experimental control afforded by artificial corpora (White & Cotterell, 2021;Papadimitriou & Jurafsky, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…Recently, a growing body work has sought to understand how these language models (LM) fit the distribution of a language beyond standard measures such as perplexity. Meister & Cotterell (2021), for example, investigated the statistical tendencies of the distribution defined by neural LMs, whereas Kulikov et al (2021) explored whether they adequately capture the modes of the distribution they attempt to model. At the same time, increased focus has been given to performance on rare or novel events in the data distribution, both for models of natural language (McCoy et al, 2021;Lent et al, 2021;Dudy & Bedrick, 2020;Oren et al, 2019) and neural models more generally (see, for example Sagawa et al, 2020;D'souza et al, 2021;Blevins & Zettlemoyer, 2020;Czarnowska et al, 2019;Horn & Perona, 2017;Ouyang et al, 2016;Bengio, 2015;Zhu et al, 2014).…”
Section: Introductionmentioning
confidence: 99%