2010 IEEE Spoken Language Technology Workshop 2010
DOI: 10.1109/slt.2010.5700858
|View full text |Cite
|
Sign up to set email alerts
|

Feature-rich continuous language models for speech recognition

Abstract: State-of-the-art probabilistic models of text such as n-grams require an exponential number of examples as the size of the context grows, a problem that is due to the discrete word representation. We propose to solve this problem by learning a continuous-valued and lowdimensional mapping of words, and base our predictions for the probabilities of the target word on non-linear dynamics of the latent space representation of the words in context window. We build on neural networks-based language models; by expres… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 15 publications
(44 reference statements)
0
3
0
Order By: Relevance
“…Related work Mirowski et al (2010) incorporated syntactic information into neural language models using PoS tags as additional input to LBLs but obtained only a small reduction of the word error rate in a speech recognition task. Similarly, Bian et al (2014) enriched the Continuous Bag-of-Words (CBOW) model of Mikolov et al (2013) by incorporating morphology, PoS tags and entity categories into 600-dimensional word embeddings trained on the Gutenberg dataset, increasing sentence completion accuracy from 41% to 44%.…”
Section: Discussionmentioning
confidence: 99%
“…Related work Mirowski et al (2010) incorporated syntactic information into neural language models using PoS tags as additional input to LBLs but obtained only a small reduction of the word error rate in a speech recognition task. Similarly, Bian et al (2014) enriched the Continuous Bag-of-Words (CBOW) model of Mikolov et al (2013) by incorporating morphology, PoS tags and entity categories into 600-dimensional word embeddings trained on the Gutenberg dataset, increasing sentence completion accuracy from 41% to 44%.…”
Section: Discussionmentioning
confidence: 99%
“…Starting from a first version consisting of 100,000 linear input word embeddings and a two-layer LSTM with 256 hidden units followed by a softmax over 100,000 output words. The second version contained 4 layers of 512-dimensional LSTMs and extra 64 inputs to the first LSTM, coming from a Latent Dirichlet Allocation (Blei, Ng, and Jordan 2003) topic model, that enables the language model to integrate long-range dependencies in the generated text and capture the general theme of the dialogue (Mirowski et al 2010), following the implementation from (Mikolov and Zweig 2012). The third version relied on pre-trained word embeddings (GloVe, Global Vectors) (Pennington, Socher, and Manning 2014) as inputs, resulting in a larger vocabulary of 250,000 input words (the GloVe word embedding matrix was considered as pre-trained and stayed fixed over the training) and only 50,000 output words.…”
Section: Automatic Language Generation In Improvised Theatrementioning
confidence: 99%
“…48 Examples that are relevant to our investigation include 49 word-level meta-information such as Part of Speech (POS) 50 or lemmas and discourse-level information such as the set-51 ting in which the speech is delivered (referred to as the 52 social-situational setting) and topic. 53 Past efforts (Mirowski et al, 2010;Chelba, 1997;Shi 54 et al, 2013;Bellegarda, 1998;Heidel et al, 2007) in lan- 55 guage modeling have demonstrated that incorporating 56 additional language-related information at different levels 57 can improve the performance of language models. 58 Conventional n-gram language models (Brown et al, 59 1992;Niesler et al, 1998;Heeman, 1999), however, 60 offer relatively limited possibilities for incorporating 61 meta-information.…”
mentioning
confidence: 99%