Recurrent neural network based language model

Mikolov, Tomáš; Karafiát, Martin; Burget, Lukáš; Černocký, Jaň; Khudanpur, Sanjeev

doi:10.21437/interspeech.2010-343

Cited by 3,037 publications

(474 citation statements)

References 9 publications

Supporting

Mentioning

466

Contrasting

Unclassified

Order By: Relevance

“…We conduct hyperparameter search, model introspection, and ablation studies on the English Penn Treebank (PTB) (Marcus, Santorini, and Marcinkiewicz 1993), utilizing the standard training (0-20), validation (21-22), and test (23-24) splits along with pre-processing by Mikolov et al (2010). With approximately 1m tokens and |V| = 10k, this version has been extensively used by the language modeling community and is publicly available.…”

Section: Methodsmentioning

confidence: 99%

“…Neural Language Models (NLM) encompass a rich family of neural network architectures for language modeling. Some example architectures include feed-forward (Bengio, Ducharme, and Vincent 2003), recurrent (Mikolov et al 2010), sum-product (Cheng et al 2014), log-bilinear (Mnih and Hinton 2007), and convolutional (Wang et al 2015) networks.…”

Section: Related Workmentioning

confidence: 99%

“…Neural Language Models (NLM) address the n-gram data sparsity issue through parameterization of words as vectors (word embeddings) and using them as inputs to a neural network (Bengio, Ducharme, and Vincent 2003;Mikolov et al 2010). The parameters are learned as part of the training process.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Character-Aware Neural Language Models

Kim

Jernite

Sontag

et al. 2016

AAAI

430

View full text Add to dashboard Cite

We describe a simple neural language model that relies only on character-level inputs. Predictions are still made at the word-level. Our model employs a convolutional neural network (CNN) and a highway net work over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). On the English Penn Treebank the model is on par with the existing state-of-the-art despite having 60% fewer parameters. On languages with rich morphology (Arabic, Czech, French, German, Spanish, Russian), the model outperforms word-level/morpheme-level LSTM baselines, again with fewer parameters. The results suggest that on many languages, character inputs are sufficient for language modeling. Analysis of word representations obtained from the character composition part of the model reveals that the model is able to encode, from characters only, both semantic and orthographic information.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Character-Aware Neural Language Models

Kim

Jernite

Sontag

et al. 2016

AAAI

430

View full text Add to dashboard Cite

show abstract

“…We use Penn Treebank Dataset (henceforth PTB) (Taylor, Marcus, and Santorini 2003) with pre-processing in (Mikolov et al 2010) and the War and Peace Dataset (henceforth WP) as the standard benchmarks for character-level language modeling. PTB contains a set of collected 2499 stories designed to allow the extraction of simple predicate and argument structure.…”

Section: Datasetsmentioning

confidence: 99%

Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling

Ahuja

Morency

2018

AAAI

View full text Add to dashboard Cite

Recurrent neural networks have shown remarkable success in modeling sequences. However low resource situations still adversely affect the generalizability of these models. We introduce a new family of models, called Lattice Recurrent Units (LRU), to address the challenge of learning deep multi-layer recurrent models with limited resources. LRU models achieve this goal by creating distinct (but coupled) flow of information inside the units: a first flow along time dimension and a second flow along depth dimension. It also offers a symmetry in how information can flow horizontally and vertically. We analyze the effects of decoupling three different components of our LRU model: Reset Gate, Update Gate and Projected State. We evaluate this family of new LRU models on computational convergence rates and statistical efficiency.Our experiments are performed on four publicly-available datasets, comparing with Grid-LSTM and Recurrent Highway networks. Our results show that LRU has better empirical computational convergence rates and statistical efficiency values, along with learning more accurate language models.

show abstract

“…We use the lexical semantics model and implementation, created by Jansen, Surdeanu, and Clark (2014), to generate domain-appropriate embeddings for a corpus of elementary science text. The embeddings are learned using the recurrent neural network language model (RNNLM) (Mikolov et al 2010;. Like any language model, a RNNLM estimates the probability of observing a word given the preceding context, but, in this process, it also learns word embeddings into a latent, conceptual space with a fixed number of dimensions.…”

Section: The Pointwise Mutual Information (Pmi) Solvermentioning

confidence: 99%

Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions

Clark

Etzioni

Khot

et al. 2016

AAAI

View full text Add to dashboard Cite

What capabilities are required for an AI system to pass standard 4th Grade Science Tests? Previous work has examined the use of Markov Logic Networks (MLNs) to represent the requisite background knowledge and interpret test questions, but did not improve upon an information retrieval (IR) baseline. In this paper, we describe an alternative approach that operates at three levels of representation and reasoning: information retrieval, corpus statistics, and simple inference over a semi-automatically constructed knowledge base, to achieve substantially improved results. We evaluate the methods on six years of unseen, unedited exam questions from the NY Regents Science Exam (using only non-diagram, multiple choice questions), and show that our overall system’s score is 71.3%, an improvement of 23.8% (absolute) over the MLN-based method described in previous work. We conclude with a detailed analysis, illustrating the complementary strengths of each method in the ensemble. Our datasets are being released to enable further research.

show abstract

Recurrent neural network based language model

Cited by 3,037 publications

References 9 publications

Character-Aware Neural Language Models

Character-Aware Neural Language Models

Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling

Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions

Contact Info

Product

Resources

About