Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2016
DOI: 10.18653/v1/n16-1037
|View full text |Cite
|
Sign up to set email alerts
|

A Latent Variable Recurrent Neural Network for Discourse-Driven Language Models

Abstract: This paper presents a novel latent variable recurrent neural network architecture for jointly modeling sequences of words and (possibly latent) discourse relations between adjacent sentences. A recurrent neural network generates individual words, thus reaping the benefits of discriminatively-trained vector representations. The discourse relations are represented with a latent variable, which can be predicted or marginalized, depending on the task. The resulting model can therefore employ a training objective t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
117
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 107 publications
(117 citation statements)
references
References 30 publications
(33 reference statements)
0
117
0
Order By: Relevance
“…The Penn Tree Bank corpus provides discourse relation annotation between the spans of text. We used the preprocessed data by Ji et al (2016b), where the explicit discourse relations are mapped into a dummy relation. Our data splits are the same as those described in the baselines (Ji et al, 2016a,b).…”
Section: Contextual Language Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…The Penn Tree Bank corpus provides discourse relation annotation between the spans of text. We used the preprocessed data by Ji et al (2016b), where the explicit discourse relations are mapped into a dummy relation. Our data splits are the same as those described in the baselines (Ji et al, 2016a,b).…”
Section: Contextual Language Modelmentioning
confidence: 99%
“…We compare our system with the Recurrent Neural Net (RNNLM) with LSTM unit (Ji et al, 2016a), the Document Contextual Lan- guage Model (DCLM) (Ji et al, 2016a) and the Discourse Relation Language Model (DRLM) (Ji et al, 2016b). The RNNLM's architecture is the same as that described in (Mikolov et al, 2013) with sigmoid non-linearity replaced by LSTM.…”
Section: Contextual Language Modelmentioning
confidence: 99%
“…Some works in DA classification treat each utterance as an independent instance (Julia et al, 2010;Gambäck et al, 2011), which leads to ignoring important long-range dependencies in the dialogue history. Other works have captured inter-utterance relationships using models such as Hidden Markov Models (HMMs) (Stolcke et al, 2000;Surendran and Levow, 2006) or Recurrent Neural Networks (RNNs) (Kalchbrenner and Blunsom, 2013;Ji et al, 2016), where RNNs have been particularly successful.…”
Section: Introductionmentioning
confidence: 99%
“…There have been many works on DA classification applied to these two datasets; some focus on textual data (Kalchbrenner and Blunsom, 2013;Stolcke et al, 2000), while others explore speech data (Julia et al, 2010). The classification methods used can be broadly divided into instancebased methods (Julia et al, 2010;Gambäck et al, 2011) and sequence-labeling methods (Stolcke et al, 2000;Kalchbrenner and Blunsom, 2013;Ji et al, 2016;Shen and Lee, 2016;Tran et al, 2017). Instance-based methods treat each utterance as an independent data point, which allows the application of general machine learning models, such as Support Vector Machines.…”
Section: Introductionmentioning
confidence: 99%
“…Instance-based methods treat each utterance as an independent data point, which allows the application of general machine learning models, such as Support Vector Machines. Sequencelabeling methods include methods based on Hidden Markov Models (HMMs) (Stolcke et al, 2000) and neural networks (Kalchbrenner and Blunsom, 2013;Ji et al, 2016;Shen and Lee, 2016;Tran et al, 2017).…”
Section: Introductionmentioning
confidence: 99%