Trends &amp; Methods in Chatbot Evaluation

Casas, Jacky; Tricot, Marc-Olivier; Khaled, Omar Abou; Mugellini, Elena; Cudré-Mauroux, Philippe

doi:10.1145/3395035.3425319

Cited by 36 publications

(19 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Alongside technical implementation specifications (F6), analysis on how research literature addresses the evaluation of conversational agents is one of the most frequent subjects of dedicated research, from a general overview of evaluation methods [72] to full-dedicated discussion [19]. We classify the discussion based on two different approaches: analysis on quality characteristics (i.e., what is evaluated) and on evaluation methods and metrics (i.e., how they are evaluated).…”

Section: Quality and Evaluation Methods (F9)mentioning

confidence: 99%

“…Regarding functional correctness, the most common term in literature is effectiveness. Casas et al [19] differentiate between functional effectiveness, which includes objective measures like command interpretation accuracy and speech synthesis and generation performance, and human effectiveness, which relates to the human similarity footing dimension described in Section 4.1. Milne-Ives et al [84] identify the process of service delivery as a general quality characteristic involving both task and communication correctness.…”

Section: Quality and Evaluation Methods (F9)mentioning

confidence: 99%

“…Concerning performance efficiency as defined by ISO/IEC 25010, few examples are mentioned which might be aligned to its quality sub-characteristics. The most common, shared term in this quality characteristic refers to the time behaviour subcharacteristics, to which research generally refers to as performance efficiency [19,80]. In terms of resource utilization, Milne-Ives et al [84] report as a major quality characteristic cost-effectiveness in the means of the relation between the cost (i.e., the resources) and the effectiveness characteristic depicted before.…”

Section: Quality and Evaluation Methods (F9)mentioning

confidence: 99%

See 2 more Smart Citations

Conversational Agents in Software Engineering: Survey, Taxonomy and Challenges

Motger¹,

Franch²,

Marco³

2021

Preprint

View full text Add to dashboard Cite

The use of natural language interfaces in the field of human-computer interaction is undergoing intense study through dedicated scientific and industrial research. The latest contributions in the field, including deep learning approaches like recurrent neural networks, the potential of context-aware strategies and user-centred design approaches, have brought back the attention of the community to software-based dialogue systems, generally known as conversational agents or chatbots. Nonetheless, and given the novelty of the field, a generic, context-independent overview on the current state of research of conversational agents covering all research perspectives involved is missing. Motivated by this context, this paper reports a survey of the current state of research of conversational agents through a systematic literature review of secondary studies. The conducted research is designed to develop an exhaustive perspective through a clear presentation of the aggregated knowledge published by recent literature within a variety of domains, research focuses and contexts. As a result, this research proposes a holistic taxonomy of the different dimensions involved in the conversational agents' field, which is expected to help researchers and to lay the groundwork for future research in the field of natural language interfaces. CCS Concepts: • General and reference → Surveys and overviews; • Human-centered computing → Natural language interfaces.

show abstract

Section: Quality and Evaluation Methods (F9)mentioning

confidence: 99%

Section: Quality and Evaluation Methods (F9)mentioning

confidence: 99%

Section: Quality and Evaluation Methods (F9)mentioning

confidence: 99%

See 1 more Smart Citation

Conversational Agents in Software Engineering: Survey, Taxonomy and Challenges

Motger¹,

Franch²,

Marco³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We employed four evaluation methods, based on (1) in-house; (2) experts; (3) real users; and (4) ISO 9214 standard of usability (effectiveness, efficiency, and satisfaction) [53].…”

Section: Second Experimentsmentioning

confidence: 99%

“…Expert evaluation can determine whether chatbot responses are suitable or natural [53,54]. We fetched the conversation history of users and chatbots during testing.…”

Section: Expert Evaluationmentioning

confidence: 99%

Mining the Chatbot Brain to Improve COVID-19 Bot Response Accuracy

Ghaleb¹,

Almurtadha²,

Algarni³

et al. 2022

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

People often communicate with auto-answering tools such as conversational agents due to their 24/7 availability and unbiased responses. However, chatbots are normally designed for specific purposes and areas of experience and cannot answer questions outside their scope. Chatbots employ Natural Language Understanding (NLU) to infer their responses. There is a need for a chatbot that can learn from inquiries and expand its area of experience with time. This chatbot must be able to build profiles representing intended topics in a similar way to the human brain for fast retrieval. This study proposes a methodology to enhance a chatbot's brain functionality by clustering available knowledge bases on sets of related themes and building representative profiles. We used a COVID-19 information dataset to evaluate the proposed methodology. The pandemic has been accompanied by an "infodemic" of fake news. The chatbot was evaluated by a medical doctor and a public trial of 308 real users. Evaluations were obtained and statistically analyzed to measure effectiveness, efficiency, and satisfaction as described by the ISO9214 standard. The proposed COVID-19 chatbot system relieves doctors from answering questions. Chatbots provide an example of the use of technology to handle an infodemic.

show abstract