2021
DOI: 10.1145/3451160
|View full text |Cite
|
Sign up to set email alerts
|

How Am I Doing?: Evaluating Conversational Search Systems Offline

Abstract: As conversational agents like Siri and Alexa gain in popularity and use, conversation is becoming a more and more important mode of interaction for search. Conversational search shares some features with traditional search, but differs in some important respects: conversational search systems are less likely to return ranked lists of results (a SERP), more likely to involve iterated interactions, and more likely to feature longer, well-formed user queries in the form of natural language questions. Because of t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 36 publications
(23 citation statements)
references
References 32 publications
(37 reference statements)
0
17
0
Order By: Relevance
“…User simulation has been widely leveraged in the past for training the dialogue state tracking component of conversational agents using reinforcement learning algorithms, either via agenda-based or model-based simulation [19]. The highly interactive nature of conversational information access systems has also sparked renewed interest in evaluation using user simulation within the IR community [4,5,23,36,38,53]. Recently, Zhang and Balog [53] proposed a general framework for evaluating conversational recommender systems using user simulation.…”
Section: Discussionmentioning
confidence: 99%
“…User simulation has been widely leveraged in the past for training the dialogue state tracking component of conversational agents using reinforcement learning algorithms, either via agenda-based or model-based simulation [19]. The highly interactive nature of conversational information access systems has also sparked renewed interest in evaluation using user simulation within the IR community [4,5,23,36,38,53]. Recently, Zhang and Balog [53] proposed a general framework for evaluating conversational recommender systems using user simulation.…”
Section: Discussionmentioning
confidence: 99%
“…It is important to note that it is also possible to succeed while leaving users frustrated, as studied by Feild et al (2010). A particular end-to-end evaluation approach was recently presented by Lipani et al (2021), based on the flow of different subtopics within a conversation.…”
Section: Metrics For End-to-end Evaluationmentioning
confidence: 99%
“…For this reason, researchers adopt human-in-the-loop techniques to mimic human-computer interactions, and further perform human annotation to evaluate the whole system's performance (in response to human). Recent work of Lipani et al [30] propose a metric for offline evaluation of conversational search systems based on user interaction model.…”
Section: Related Workmentioning
confidence: 99%