Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.103
|View full text |Cite
|
Sign up to set email alerts
|

Scaling Hidden Markov Language Models

Abstract: The hidden Markov model (HMM) is a fundamental tool for sequence modeling that cleanly separates the hidden state from the emission structure. However, this separation makes it difficult to fit HMMs to large datasets in modern NLP, and they have fallen out of use due to very poor performance compared to fully observed models. This work revisits the challenge of scaling HMMs to language modeling datasets, taking ideas from recent approaches to neural modeling. We propose methods for scaling HMMs to massive stat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
25
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(26 citation statements)
references
References 17 publications
0
25
0
Order By: Relevance
“…When evaluating our model with a large number of symbols, we find that only a small fraction of the symbols are predicted in the parse trees (for example, when our model uses 250 nonterminals, only tens of them are found in the predicted parse trees of the test corpus). We expect that our models can benefit from regularization techniques such as state dropout (Chiu and Rush, 2020).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…When evaluating our model with a large number of symbols, we find that only a small fraction of the symbols are predicted in the parse trees (for example, when our model uses 250 nonterminals, only tens of them are found in the predicted parse trees of the test corpus). We expect that our models can benefit from regularization techniques such as state dropout (Chiu and Rush, 2020).…”
Section: Discussionmentioning
confidence: 99%
“…For example, the best model from Petrov et al (2006) contains over 1000 nonterminal and preterminal symbols. We are also motivated by the recent work of Buhai et al (2019) who show that when learning latent variable models, increasing the number of hidden states is often helpful; and by Chiu and Rush (2020) who show that a neural hidden Markov model with up to 2 16 hidden states can achieve surprisingly good performance in language modeling. A major challenge in employing a large number of nonterminal and preterminal symbols is that representing and parsing with a PCFG requires a computational complexity that is cubic in its symbol number.…”
Section: Introductionmentioning
confidence: 99%
“…For example, Dai et al (2017) and incorporate recurrent units into the hidden semi-Markov model (HSMM) to segment and label highdimensional time series; learn discrete template structures for conditional text generation using neuralized HSMM. Wessels and Omlin (2000) and Chiu and Rush (2020) factorize HMM with neural networks to scale it and improve its sequence modeling capacity. The work most related to ours leverages neural HMM for sequence labeling (Tran et al, 2016 mizes the marginal likelihood of the observations.…”
Section: Related Workmentioning
confidence: 99%
“…The constrained local attention in Transformer-C is adopted at all layers of models such as Longformer (Beltagy et al, 2020) and Big Bird (Zaheer et al, 2020) due to its sparsity. Our work conceptually resembles that of Chiu and Rush (2020), who modernize HMM language models, as well as simple RNN-based language models (Merity et al, 2018). Our linguistic analysis is inspired by experiments from Khandelwal et al (2018).…”
Section: Related Workmentioning
confidence: 99%