Incremental Transformer with Deliberation Decoder for Document Grounded Conversations

Li, Zekang; Niu, Cheng; Meng, Fandong; Feng, Yang; Li, Qian; Zhou, Jie

doi:10.18653/v1/p19-1002

Cited by 90 publications

(85 citation statements)

References 25 publications

(26 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Performance was evaluated in terms of concept error rate (CER) 3 and concept value error rate (CVER) 4 on the MEDIA test dataset.…”

Section: Resultsmentioning

confidence: 99%

“…The task of spoken language understanding (SLU) system is to detect fragments of semantic knowledge in speech data. Popular models are made of frames describing relations between entities and their properties [1][2][3]. The SLU system instantiates a predefined set of frame structures called concepts that can be mentioned in a sentence or a dialogue turn.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Dialogue History Integration into End-to-End Signal-to-Concept Spoken Language Understanding Systems

Tomashenko

Raymond

Caubrière

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

This work investigates the embeddings for representing dialog history in spoken language understanding (SLU) systems. We focus on the scenario when the semantic information is extracted directly from the speech signal by means of a single end-to-end neural network model. We proposed to integrate dialogue history into an endto-end signal-to-concept SLU system. The dialog history is represented in the form of dialog history embedding vectors (so-called h-vectors) and is provided as an additional information to end-toend SLU models in order to improve the system performance. Three following types of h-vectors are proposed and experimentally evaluated in this paper: (1) supervised-all embeddings predicting bagof-concepts expected in the answer of the user from the last dialog system response; (2) supervised-freq embeddings focusing on predicting only a selected set of semantic concept (corresponding to the most frequent errors in our experiments); and (3) unsupervised embeddings. Experiments on the MEDIA corpus for the semantic slot filling task demonstrate that the proposed h-vectors improve the model performance.Index Terms-End-to-end models, spoken language understanding (SLU), dialog history, h-vectors, semantic slot filling (SF)

show abstract

“…Performance was evaluated in terms of concept error rate (CER) 3 and concept value error rate (CVER) 4 on the MEDIA test dataset.…”

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Dialogue History Integration into End-to-End Signal-to-Concept Spoken Language Understanding Systems

Tomashenko

Raymond

Caubrière

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Deliberation mechanism has succeeded in improving the performance of single-task learning [39], [40]. Y. Xia et al proposed deliberation networks for word sequence generation and demonstrated its effectiveness in machine translation and text summarization [39].…”

Section: Related Workmentioning

confidence: 99%

A Perpetual Learning Algorithm That Incrementally Improves Performance With Deliberation

Qin

Zhang

2020

IEEE Access

View full text Add to dashboard Cite

During recent years, several different proposals for continuous learning (lifelong learning, never-ending learning, perpetual learning) have attracted much attention from researchers in the field of machine learning. In this paper, a perpetual learning algorithm, which is augmented with a deliberation mechanism that is geared toward incrementally improving performance on all tasks learned thus far, is described. The algorithm maintains a prototype library where each prototype in the library is a model parameters vector for a representative task learned, and produces the model for a new task as a linear combination of the prototypes. The deliberation process ensues based on two criteria (difference in loss and cosine similarity between single task model and encoded model) for the newly learned task model. If the prototypes in the library represent a new task well, then the library only needs to be tuned. Otherwise, if the prototypes in the library cannot represent a new task well, then the algorithm further determines if replacing a prototype in the library makes the library better suited for all learned tasks. If so, the prototype library will be reconstructed. Compared with other existing methods, the proposed method has several salient features. Firstly, by selecting the best prototypes to form the library to represent all learned tasks, the algorithm provided in the paper has better classification and regression performance than, say, Efficient Lifelong Learning Algorithm (ELLA) in circumstances in which only the training set for the current task can be buffered. Secondly, without the need to buffer training sets for multiple tasks, the proposed algorithm shows competitive performance when compared with curriculum learning.

show abstract

“…trains the Seq2Seq network to simultaneously perform a discriminant classifier, which measures the difference between the human-generated response and the machine-generated response and introduces an approximate embedding layer to solve the non-differentiable problem caused by sampling-based output decoding in the Seq2Seq generation model steps. • BigLM-24: (the code and models are available at https://github.com/lipiji/Guyu) This is a language model with both the pre-training and fine-tuning procedures [26]. BigLM-24 is the typical GPT-2 model with 345 million parameters (1024 dimensions, 24 layers, 16 heads).…”

Section: Comparison Modelsmentioning

confidence: 99%

“…Reference [25] proposed a NMT model via multi-head attention; others were inspired by this paper. Reference [26] proposed an incremental transformer with the deliberation decoder to solve the task of document grounded conversations. Reference [27] proposed a transformer-based model to address multi-turn unstructured text facts open-domain dialogue.…”

mentioning

confidence: 99%

An Empirical Study on Deep Neural Network Models for Chinese Dialogue Generation

Maimaiti

Sheng

et al. 2020

Symmetry

View full text Add to dashboard Cite

The task of dialogue generation has attracted increasing attention due to its diverse downstream applications, such as question-answering systems and chatbots. Recently, the deep neural network (DNN)-based dialogue generation models have achieved superior performance against conventional models utilizing statistical machine learning methods. However, despite that an enormous number of state-of-the-art DNN-based models have been proposed, there lacks detailed empirical comparative analysis for them on the open Chinese corpus. As a result, relevant researchers and engineers might find it hard to get an intuitive understanding of the current research progress. To address this challenge, we conducted an empirical study for state-of-the-art DNN-based dialogue generation models in various Chinese corpora. Specifically, extensive experiments were performed on several well-known single-turn and multi-turn dialogue corpora, including KdConv, Weibo, and Douban, to evaluate a wide range of dialogue generation models that are based on the symmetrical architecture of Seq2Seq, RNNSearch, transformer, generative adversarial nets, and reinforcement learning respectively. Moreover, we paid special attention to the prevalent pre-trained model for the quality of dialogue generation. Their performances were evaluated by four widely-used metrics in this area: BLEU, pseudo, distinct, and rouge. Finally, we report a case study to show example responses generated by these models separately.

show abstract

Incremental Transformer with Deliberation Decoder for Document Grounded Conversations

Cited by 90 publications

References 25 publications

Dialogue History Integration into End-to-End Signal-to-Concept Spoken Language Understanding Systems

Dialogue History Integration into End-to-End Signal-to-Concept Spoken Language Understanding Systems

A Perpetual Learning Algorithm That Incrementally Improves Performance With Deliberation

An Empirical Study on Deep Neural Network Models for Chinese Dialogue Generation

Contact Info

Product

Resources

About