Feature-rich continuous language models for speech recognition

Mirowski, Piotr; Chopra, Sumit; Balakrishnan, Suhrid; Bangalore, Srinivas

doi:10.1109/slt.2010.5700858

Cited by 5 publications

(3 citation statements)

References 15 publications

(44 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Related work Mirowski et al (2010) incorporated syntactic information into neural language models using PoS tags as additional input to LBLs but obtained only a small reduction of the word error rate in a speech recognition task. Similarly, Bian et al (2014) enriched the Continuous Bag-of-Words (CBOW) model of Mikolov et al (2013) by incorporating morphology, PoS tags and entity categories into 600-dimensional word embeddings trained on the Gutenberg dataset, increasing sentence completion accuracy from 41% to 44%.…”

Section: Discussionmentioning

confidence: 99%

Dependency Recurrent Neural Language Models for Sentence Completion

Mirowski

Vlachos

2015

Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Confere

Self Cite

View full text Add to dashboard Cite

Recent work on language modelling has shifted focus from count-based models to neural models. In these works, the words in each sentence are always considered in a left-to-right order. In this paper we show how we can improve the performance of the recurrent neural network (RNN) language model by incorporating the syntactic dependencies of a sentence, which have the effect of bringing relevant contexts closer to the word being predicted. We evaluate our approach on the Microsoft Research Sentence Completion Challenge and show that the dependency RNN proposed improves over the RNN by about 10 points in accuracy. Furthermore, we achieve results comparable with the stateof-the-art models on this task.

show abstract

Section: Discussionmentioning

confidence: 99%

Dependency Recurrent Neural Language Models for Sentence Completion

Mirowski

Vlachos

2015

Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Confere

Self Cite

View full text Add to dashboard Cite

show abstract

“…Starting from a first version consisting of 100,000 linear input word embeddings and a two-layer LSTM with 256 hidden units followed by a softmax over 100,000 output words. The second version contained 4 layers of 512-dimensional LSTMs and extra 64 inputs to the first LSTM, coming from a Latent Dirichlet Allocation (Blei, Ng, and Jordan 2003) topic model, that enables the language model to integrate long-range dependencies in the generated text and capture the general theme of the dialogue (Mirowski et al 2010), following the implementation from (Mikolov and Zweig 2012). The third version relied on pre-trained word embeddings (GloVe, Global Vectors) (Pennington, Socher, and Manning 2014) as inputs, resulting in a larger vocabulary of 250,000 input words (the GloVe word embedding matrix was considered as pre-trained and stayed fixed over the training) and only 50,000 output words.…”

Section: Automatic Language Generation In Improvised Theatrementioning

confidence: 99%

Improvised Theatre Alongside Artificial Intelligences

Mathewson

Mirowski

2021

AIIDE

View full text Add to dashboard Cite

This study presents the first report of Artificial Improvisation, or improvisational theatre performed live, on-stage, alongside an artificial intelligence-based improvisational performer. The Artificial Improvisor is a form of artificial conversational agent, or chatbot, focused on open domain dialogue and collaborative narrative generation. Using state-of-the-art machine learning techniques spanning from natural language processing and speech recognition to reinforcement and deep learning, these chatbots have become more lifelike and harder to discern from humans. Recent work in conversational agents has been focused on goal-directed dialogue focused on closed domains such as appointment setting, bank information requests, question-answering, and movie discussion. Natural human conversations are seldom limited in scope and jump from topic to topic, they are laced with metaphor and subtext and face-to-face communication is supplemented with non-verbal cues. Live improvised performance takes natural conversation one step further with multiple actors performing in front of an audience. In improvisation the topic of the conversation is often given by the audience several times during the performance. These suggestions inspire actors to perform novel, unique, and engaging scenes. During each scene, actors must make rapid fire decisions to collaboratively generate coherent narratives. We have embarked on a journey to perform live improvised comedy alongside artificial intelligence systems. We introduce Pyggy and A.L.Ex. (Artificial Language Experiment), the first two Artificial Improvisors, each with a unique composition and embodiment. This work highlights research and development, successes and failures along the way, celebrates collaborations enabling progress, and presents discussions for future work in the space of artificial improvisation.

show abstract

“…48 Examples that are relevant to our investigation include 49 word-level meta-information such as Part of Speech (POS) 50 or lemmas and discourse-level information such as the set-51 ting in which the speech is delivered (referred to as the 52 social-situational setting) and topic. 53 Past efforts (Mirowski et al, 2010;Chelba, 1997;Shi 54 et al, 2013;Bellegarda, 1998;Heidel et al, 2007) in lan- 55 guage modeling have demonstrated that incorporating 56 additional language-related information at different levels 57 can improve the performance of language models. 58 Conventional n-gram language models (Brown et al, 59 1992;Niesler et al, 1998;Heeman, 1999), however, 60 offer relatively limited possibilities for incorporating 61 meta-information.…”

mentioning

confidence: 99%

Integrating meta-information into recurrent neural network language models

Shi

Larson

Pelemans

et al. 2015

Speech Communication

View full text Add to dashboard Cite

16Due to their advantages over conventional n-gram language models, recurrent neural network language models (RNNLMs) recently 17 have attracted a fair amount of research attention in the speech recognition community. In this paper, we explore one advantage of 18 RNNLMs, namely, the ease with which they allow the integration of additional knowledge sources. We concentrate on features that provide 19 complementary information w.r.t. the lexical identities of the words. We refer to such information as meta-information. We single out 20 three cases and investigate their merits by means of N-best list re-scoring experiments on a challenging corpus of spoken Dutch (referred 21 to as CGN) as well as on the English Wall Street Journal (WSJ) corpus. First, we look at Parts of Speech (POS) tags and lemmas, two 22 sources of word-level linguistic information that are known to make a contribution to the performance of conventional language models. 23 We confirm that RNNLMs can benefit from these sources as well. Second, we investigate socio-situational settings (SSSs) and topics, two 24 sources of discourse-level information that are also known to benefit language models. SSSs are present in the CGN data, and can be seen as 25 a proxy for the language register. For the purposes of our investigation, we assume that information on the SSS can be captured at the 26 moment at which speech is recorded. Topics, i.e., treatments of different subjects, are present in the WSJ data. In order to predict POS, 27 lemmas, SSS and topic, a second RNNLM is coupled to the main RNNLM. We refer to this architecture as a recurrent neural network tandem 28 language model (RNNTLM). Our experimental findings show that if high-quality meta-information labels are available, both word-level 29 and discourse-level information improve performance of language models. Third, we investigate sentence length and word length 30 (i.e., token size), two sources of intrinsic information that are readily available for exploitation because they are known at the time of 31 re-scoring. Intrinsic information has been largely overlooked by language modeling research. The results of both experiments on CGN 32 data and WSJ data show that integrating sentence length and word length can achieve improvement. RNNLMs allow these features to 33 be incorporated with ease, and obtain improved performance. 34

show abstract

Feature-rich continuous language models for speech recognition

Cited by 5 publications

References 15 publications

Dependency Recurrent Neural Language Models for Sentence Completion

Dependency Recurrent Neural Language Models for Sentence Completion

Improvised Theatre Alongside Artificial Intelligences

Integrating meta-information into recurrent neural network language models

Contact Info

Product

Resources

About