Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL) 2014
DOI: 10.3115/v1/w14-4342
|View full text |Cite
|
Sign up to set email alerts
|

Extrinsic Evaluation of Dialog State Tracking and Predictive Metrics for Dialog Policy Optimization

Abstract: During the recent Dialog State Tracking Challenge (DSTC), a fundamental question was raised: "Would better performance in dialog state tracking translate to better performance of the optimized policy by reinforcement learning?" Also, during the challenge system evaluation, another nontrivial question arose: "Which evaluation metric and schedule would best predict improvement in overall dialog performance?" This paper aims to answer these questions by applying an off-policy reinforcement learning method to the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2014
2014
2017
2017

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 12 publications
0
10
0
Order By: Relevance
“…Second, in DSTC2, the question of what to measure was posed differently, as "Which evaluation metric and schedule would best predict improvement in overall dialog performance?" (Lee, 2014). The author uses the data to optimize a reinforcement learning-based dialog manager, then runs a regression analysis to see which metrics are the best predictors of end-to-end dialog performance.…”
Section: Challenge Entries and Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Second, in DSTC2, the question of what to measure was posed differently, as "Which evaluation metric and schedule would best predict improvement in overall dialog performance?" (Lee, 2014). The author uses the data to optimize a reinforcement learning-based dialog manager, then runs a regression analysis to see which metrics are the best predictors of end-to-end dialog performance.…”
Section: Challenge Entries and Resultsmentioning
confidence: 99%
“…This has had unforeseen benefits: first, the DSTC data now forms a sort of benchmark for the field, with groups continuing to report results on it after the challenge proper (Lee, 2013;Ma and Fosler-Lussier, 2014b;Zilka and Jurčíček, 2015;Fix and Frezza-Buet, 2015). In addition, the DSTC1-3 corpora have been used to examine which state tracking evaluation metrics correlate with dialog success (Lee, 2014), perform detailed error analyses of state trackers (Smith, 2014), and for dialog act classification and SLU experimentation (Ma and Fosler-Lussier, 2014a;Ferreira et al, 2015). We encourage future challenges to continue this tradition.…”
Section: Featuresmentioning
confidence: 99%
“…In this paper, we use L2 metric as the loss function since it is found to be most influential to dialog system performance (Lee, 2014). The model is hence optimized to minimize the L2 loss function…”
Section: Optimizationmentioning
confidence: 99%
“…A held-out dialog corpus is used as testing set, and the estimated cumulative reward for the testing dialogs when following the target DM policy is used as metric for performance. A similar approach has been taken in evaluating the effect of different dialog state tracker on end-to-end performance of a DM [15]. The estimation of Q-function is similar to Algorithm 2.…”
Section: On-corpus Dm Evaluationmentioning
confidence: 99%