Gunrock: A Social Bot for Complex and Engaging Long Conversations

Yu, Dian; Cohn, Michelle; Yang, Yi Mang; Chen, Chun‐Yen; Wen, Weiming; Zhang, Jiaping; Zhou, Mingyang; Jesse, Kevin; Chau, Austin; Bhowmick, Antara; Iyer, Shreenath; Sreenivasulu, Giritheja; Davidson, Sam; Bhandare, Ashwin; Yu, Zhou

doi:10.18653/v1/d19-3014

Cited by 15 publications

(17 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Different from previous empirical observations, we conduct a large-scale quantitative and qualitative data analysis of Likert score based ratings. To address the issue of Likert scores, the Alexa team proposed a rule-based ensemble of turn-granularity expert ratings (Yi et al, 2019), and automatic metrics like topical diversity and conversational breadth. ACUTE-EVAL ) makes a small-scale attempt to use multi-turn pair-wise comparison to rank different chatbots.…”

Section: Related Workmentioning

confidence: 99%

Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation

Liang

Zou

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Open Domain dialog system evaluation is one of the most important challenges in dialog research. Existing automatic evaluation metrics, such as BLEU are mostly referencebased. They calculate the difference between the generated response and a limited number of available references. Likert-score based self-reported user rating is widely adopted by social conversational systems, such as Amazon Alexa Prize chatbots. However, selfreported user rating suffers from bias and variance among different users. To alleviate this problem, we formulate dialog evaluation as a comparison task. We also propose an automatic evaluation model CMADE (Comparison Model for Automatic Dialog Evaluation) that automatically cleans self-reported user ratings as it trains on them. Specifically, we first use a self-supervised method to learn better dialog feature representation, and then use KNN and Shapley to remove confusing samples. Our experiments show that CMADE achieves 89.2% accuracy in the dialog comparison task. Our implementation is available at https://github.com/Weixin-Liang/ dialog_evaluation_CMADE.

show abstract

Section: Related Workmentioning

confidence: 99%

Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation

Liang

Zou

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…Spoken Language Understanding (SLU) 1 is at the front-end of many modern intelligent home devices, virtual assistants, and socialbots [1,2]: given a spoken command, an SLU engine should extract relevant semantics 2 from spoken commands for the appropriate downstream tasks. Since SLU tasks such as the Airline Travel Information System (ATIS) [4], the field has progressed from knowledge-based [5] to data-driven approaches, notably those based on neural networks.…”

Section: Introductionmentioning

confidence: 99%

“…The fourth author contributed to the work before joining Amazon. 1 SLU typically consists of Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU). ASR maps audio to text, and NLU maps text to semantics.…”

Section: Introductionmentioning

confidence: 99%

“…Here, we are interested in learning a mapping directly from raw audio to semantics. 2 Semantic acquisition is commonly framed as Intent Classification (IC) and Slot Labeling/Filling (SL), see [1,2,3]. [10], Intent Classification (IC) and Slot Labeling (SL) are jointly predicted on top of BERT, discarding the need of a Conditional Random Fields (CRF) [11].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining

Lai¹,

Chuang

Lee

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Much recent work on Spoken Language Understanding (SLU) is limited in at least one of three ways: models were trained on oracle text input and neglected ASR errors, models were trained to predict only intents without the slot values, or models were trained on a large amount of in-house data. In this paper, we propose a clean and general framework to learn semantics directly from speech with semi-supervision from transcribed or untranscribed speech to address these issues. Our framework is built upon pretrained end-toend (E2E) ASR and self-supervised language models, such as BERT, and fine-tuned on a limited amount of target SLU data. We study two semi-supervised settings for the ASR component: supervised pretraining on transcribed speech, and unsupervised pretraining by replacing the ASR encoder with self-supervised speech representations, such as wav2vec. In parallel, we identify two essential criteria for evaluating SLU models: environmental noise-robustness and E2E semantics evaluation. Experiments on ATIS show that our SLU framework with speech as input can perform on par with those using oracle text as input in semantics understanding, even though environmental noise is present and a limited amount of labeled semantics data is available for training.

show abstract

“…Building on the idea of attention-based seq2seq models (Vaswani et al, 2017), recent language models such as BERT (Devlin et al, 2019) and GPT-2 (Radford et al, 2019) enable neural conversational models to generate responses that appear human-like and engaging (Yu et al, 2019). A closer look, however, reveals that the lack of long-term memory to represent consistent (world) knowledge and personality over multiple speaker turns can lead to incoherent content being generated (Li et al, 2016;Serban et al, 2017).…”

Section: Introductionmentioning

confidence: 99%

Space Efficient Context Encoding for Non-Task-Oriented Dialogue Generation with Graph Attention Transformer

Galetzka¹,

Rose²,

Schlangen³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

To improve the coherence and knowledge retrieval capabilities of non-task-oriented dialogue systems, recent Transformer-based models aim to integrate fixed background context. This often comes in the form of knowledge graphs, and the integration is done by creating pseudo utterances through paraphrasing knowledge triples, added into the accumulated dialogue context. However, the context length is fixed in these architectures, which restricts how much background or dialogue context can be kept. In this work, we propose a more concise encoding for background context structured in the form of knowledge graphs, by expressing the graph connections through restrictions on the attention weights. The results of our human evaluation show that this encoding reduces space requirements without negative effects on the precision of reproduction of knowledge and perceived consistency. Further, models trained with our proposed context encoding generate dialogues that are judged to be more comprehensive and interesting.

show abstract

Gunrock: A Social Bot for Complex and Engaging Long Conversations

Cited by 15 publications

References 9 publications

Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation

Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation

Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining

Space Efficient Context Encoding for Non-Task-Oriented Dialogue Generation with Graph Attention Transformer

Contact Info

Product

Resources

About