Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-1188
|View full text |Cite
|
Sign up to set email alerts
|

LSDSCC: a Large Scale Domain-Specific Conversational Corpus for Response Generation with Diversity Oriented Evaluation Metrics

Abstract: It has been proven that automatic conversational agents can be built up using the Endto-End Neural Response Generation (NRG) framework, and such a data-driven methodology requires a large number of dialog pairs for model training and reasonable evaluation metrics for testing. This paper proposes a Large Scale Domain-Specific Conversational Corpus (LSDSCC) composed of high-quality queryresponse pairs extracted from the domainspecific online forum, with thorough preprocessing and cleansing procedures. Also, a te… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(15 citation statements)
references
References 24 publications
0
15
0
Order By: Relevance
“…Ritter et al (2011) suggested that an appropriate response should be on the same topic as the utterances. Several other studies have also focused on evaluating the relevance between an utterance and its response (Xu et al, 2018b;Pei and Li, 2018;Lowe et al, 2017b).…”
Section: Criteria For Manual Evaluationmentioning
confidence: 99%
“…Ritter et al (2011) suggested that an appropriate response should be on the same topic as the utterances. Several other studies have also focused on evaluating the relevance between an utterance and its response (Xu et al, 2018b;Pei and Li, 2018;Lowe et al, 2017b).…”
Section: Criteria For Manual Evaluationmentioning
confidence: 99%
“…Distinct is, however, computed across contexts and does not measure if a model can generate multiple valid responses for a context. Xu et al (2018) proposed Mean Diversity Score (MDS) and Probabilistic Diversity Score (PDS) metrics for diversity evaluation over groups of multiple references over a set of retrieved references. Hashimoto et al (2019) proposed a metric for a unified evaluation of quality and diversity of outputs, which however depends on human judgements.…”
Section: Related Workmentioning
confidence: 99%
“…Automatic evaluation of generative dialog model remains an open research challenge [43]. As a complementary result to automatic evaluation, we also present human evaluation.…”
Section: E Human Evaluationmentioning
confidence: 99%