LSDSCC: a Large Scale Domain-Specific Conversational Corpus for
            Response Generation with Diversity Oriented Evaluation Metrics

Xu, Zhen; Jiang, Nan; Liu, Bingquan; Rong, Wenge; Wu, Bowen; Wang, Baoxun; Wang, Zhuoran; Wang, Xiaolong

doi:10.18653/v1/n18-1188

Cited by 13 publications

(15 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ritter et al (2011) suggested that an appropriate response should be on the same topic as the utterances. Several other studies have also focused on evaluating the relevance between an utterance and its response (Xu et al, 2018b;Pei and Li, 2018;Lowe et al, 2017b).…”

Section: Criteria For Manual Evaluationmentioning

confidence: 99%

Filtering Noisy Dialogue Corpora by Connectivity and Content Relatedness

Akama¹,

Yokoi

Suzuki

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Large-scale dialogue datasets have recently become available for training neural dialogue agents. However, these datasets have been reported to contain a non-negligible number of unacceptable utterance pairs. In this paper, we propose a method for scoring the quality of utterance pairs in terms of their connectivity and relatedness. The proposed scoring method is designed based on findings widely shared in the dialogue and linguistics research communities. We demonstrate that it has a relatively good correlation with the human judgment of dialogue quality. Furthermore, the method is applied to filter out potentially unacceptable utterance pairs from a large-scale noisy dialogue corpus to ensure its quality. We experimentally confirm that training data filtered by the proposed method improves the quality of neural dialogue agents in response generation. 1

show abstract

Section: Criteria For Manual Evaluationmentioning

confidence: 99%

Filtering Noisy Dialogue Corpora by Connectivity and Content Relatedness

Akama¹,

Yokoi

Suzuki

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…Distinct is, however, computed across contexts and does not measure if a model can generate multiple valid responses for a context. Xu et al (2018) proposed Mean Diversity Score (MDS) and Probabilistic Diversity Score (PDS) metrics for diversity evaluation over groups of multiple references over a set of retrieved references. Hashimoto et al (2019) proposed a metric for a unified evaluation of quality and diversity of outputs, which however depends on human judgements.…”

Section: Related Workmentioning

confidence: 99%

Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References

Gupta¹,

Mehri²,

Zhao³

et al. 2019

Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

View full text Add to dashboard Cite

The aim of this paper is to mitigate the shortcomings of automatic evaluation of open-domain dialog systems through multireference evaluation. Existing metrics have been shown to correlate poorly with human judgement, particularly in open-domain dialog. One alternative is to collect human annotations for evaluation, which can be expensive and time consuming. To demonstrate the effectiveness of multi-reference evaluation, we augment the test set of DailyDialog with multiple references. A series of experiments show that the use of multiple references results in improved correlation between several automatic metrics and human judgement for both the quality and the diversity of system output.

show abstract

“…Automatic evaluation of generative dialog model remains an open research challenge [43]. As a complementary result to automatic evaluation, we also present human evaluation.…”

Section: E Human Evaluationmentioning

confidence: 99%

Augmenting Dialogue Response Generation With Unstructured Textual Knowledge

et al. 2019

Self Cite

View full text Add to dashboard Cite

The dialogue response generation is a challenging task in chatbot applications. Recently neuralnetwork-based dialogue models, including the sequence-to-sequence model and the RNN language models, are able to generate fluent and grammatically compliant responses, while there is a major limitation that most of the responses generated by these models are of chitchat style instead of being informative. After investigating the currently used models, we found that one primary challenge is to model and generate informative words, such as named entities, especially when the entities have sparsely existed in training corpus. To address this problem, we propose to augment neural network-based generative architecture with knowledge embedding and knowledge attentive reader to incorporate external textual knowledge into the dialogue model to facilitate the dialogue modeling and generation. We evaluate the model with the Ubuntu dataset through automatic evaluation metrics and human evaluation. The experimental study has shown our methods outperform strong baselines in multiple metrics. We also visualize how the attention works in the dialogue context to verify the effectiveness of knowledge attentive reader mechanism. INDEX TERMS Dialogue response generation, unstructured knowledge, knowledge embedding, attentive reader.

show abstract

LSDSCC: a Large Scale Domain-Specific Conversational Corpus for Response Generation with Diversity Oriented Evaluation Metrics

Cited by 13 publications

References 24 publications

Filtering Noisy Dialogue Corpora by Connectivity and Content Relatedness

Filtering Noisy Dialogue Corpora by Connectivity and Content Relatedness

Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References

Augmenting Dialogue Response Generation With Unstructured Textual Knowledge

Contact Info

Product

Resources

About