Companion Publication of the 2020 International Conference on Multimodal Interaction 2020
DOI: 10.1145/3395035.3425319
|View full text |Cite
|
Sign up to set email alerts
|

Trends & Methods in Chatbot Evaluation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 36 publications
(19 citation statements)
references
References 43 publications
0
17
0
Order By: Relevance
“…Alongside technical implementation specifications (F6), analysis on how research literature addresses the evaluation of conversational agents is one of the most frequent subjects of dedicated research, from a general overview of evaluation methods [72] to full-dedicated discussion [19]. We classify the discussion based on two different approaches: analysis on quality characteristics (i.e., what is evaluated) and on evaluation methods and metrics (i.e., how they are evaluated).…”
Section: Quality and Evaluation Methods (F9)mentioning
confidence: 99%
See 2 more Smart Citations
“…Alongside technical implementation specifications (F6), analysis on how research literature addresses the evaluation of conversational agents is one of the most frequent subjects of dedicated research, from a general overview of evaluation methods [72] to full-dedicated discussion [19]. We classify the discussion based on two different approaches: analysis on quality characteristics (i.e., what is evaluated) and on evaluation methods and metrics (i.e., how they are evaluated).…”
Section: Quality and Evaluation Methods (F9)mentioning
confidence: 99%
“…Regarding functional correctness, the most common term in literature is effectiveness. Casas et al [19] differentiate between functional effectiveness, which includes objective measures like command interpretation accuracy and speech synthesis and generation performance, and human effectiveness, which relates to the human similarity footing dimension described in Section 4.1. Milne-Ives et al [84] identify the process of service delivery as a general quality characteristic involving both task and communication correctness.…”
Section: Quality and Evaluation Methods (F9)mentioning
confidence: 99%
See 1 more Smart Citation
“…We employed four evaluation methods, based on (1) in-house; (2) experts; (3) real users; and (4) ISO 9214 standard of usability (effectiveness, efficiency, and satisfaction) [53].…”
Section: Second Experimentsmentioning
confidence: 99%
“…Expert evaluation can determine whether chatbot responses are suitable or natural [53,54]. We fetched the conversation history of users and chatbots during testing.…”
Section: Expert Evaluationmentioning
confidence: 99%