Proceedings of the First Workshop on NLP for Conversational AI 2019
DOI: 10.18653/v1/w19-4107
|View full text |Cite
|
Sign up to set email alerts
|

DSTC7 Task 1: Noetic End-to-End Response Selection

Abstract: Goal-oriented dialogue in complex domains is an extremely challenging problem and there are relatively few datasets. This task provided two new resources that presented different challenges: one was focused but small, while the other was large but diverse. We also considered several new variations on the next utterance selection problem: (1) increasing the number of candidates, (2) including paraphrases, and (3) not including a correct option in the candidate set. Twenty teams participated, developing a range … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
27
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 36 publications
(30 citation statements)
references
References 8 publications
0
27
0
Order By: Relevance
“…This is Recall@1 using 99 responses sampled from the test dataset as negatives. This 1-of-100 accuracy metric has been used in previous studies: ( Al-Rfou et al, 2016;Henderson et al, 2017;Kumar et al, 2018;Gunasekara et al, 2019). While there is no guarantee that the 99 randomly selected negatives will all be bad responses, the metric nevertheless provides a simple summary of model performance that has been shown to correlate with user-driven quality metrics (Henderson et al, 2017).…”
Section: Response Selection Taskmentioning
confidence: 99%
“…This is Recall@1 using 99 responses sampled from the test dataset as negatives. This 1-of-100 accuracy metric has been used in previous studies: ( Al-Rfou et al, 2016;Henderson et al, 2017;Kumar et al, 2018;Gunasekara et al, 2019). While there is no guarantee that the 99 randomly selected negatives will all be bad responses, the metric nevertheless provides a simple summary of model performance that has been shown to correlate with user-driven quality metrics (Henderson et al, 2017).…”
Section: Response Selection Taskmentioning
confidence: 99%
“…To compare with other systems, Table 4 presents the official scores for each team which submitted results for all 8 subtasks of the DSTC7 response selection track. More details can be found in (Gunasekara et al, 2019a). Among 8 subtasks in total, our results (Team 3) rank top 1 on 7 subtasks, rank the second best on subtask 2 of Ubuntu, and overall rank top 1 on both datasets of the response se-lection challenge 4 .…”
Section: Dstc7 Resultsmentioning
confidence: 92%
“…Recall@N is used by the challenge organizers following Lowe et al (2015), which counts how often the correct answer is within the top N specified by a system. For DSTC7 results in this paper, N is set to 1, 10, 50, due to the large candidate set (100 candidates) (Gunasekara et al, 2019b). Mean Reciprocal Rank (MRR) as a widely used metric from the ranking literature is also used by the challenge organizers (Gunasekara et al, 2019b).…”
Section: Dstc7 Resultsmentioning
confidence: 99%
See 2 more Smart Citations