Background:The demand for mental health services in the community continues to exceed supply. At the same time, technological developments make the use of artificial intelligence-empowered Conversational Agents (CAs) a real possibility for helping to fill this gap.
Objective:The objective of this review is to identify existing empathic CA design architectures within the mental healthcare sector and to assess their technical performance in terms of classification accuracy. In addition, the approaches used to evaluate empathic CAs within the mental healthcare sector in terms of their acceptability to users will be evaluated. Finally, this review aims to identify limitations and future directions for empathic CAs in mental healthcare.Methods: A systematic literature search was conducted across six academic databases to identify journal articles and conference proceedings using search terms covering three topics: 'conversational agents', 'mental health', and 'empathy'. Only studies discussing CA interventions for the mental healthcare domain were eligible for this review with both textual and vocal characteristics considered as possible data inputs. Quality was assessed using appropriate risk of bias and quality tools.Results: A total of 19 articles met all inclusion and exclusion criteria. The observed terms used to identify a CA varied from 'chatbot' (47%), 'conversational agent' (32%), 'dialog system' (11%), 'virtual assistant' (5%) to 'conversational AI agent' (5%). Transformer-based (37%) and hybrid (26%) engines were the most employed designs. A technical evaluation of CA performance was conducted for 17 of the 19 papers reviewed. While a variety of single-engine CAs exhibited good accuracy (F1 scores >95%), superior accuracy was achieved using hybrid engines that were able to provide a more nuanced response. However, human evaluations of CAs were less positive. Only five (26%) of the 19 studies referred to an explicit definition of empathy and only 84% of the selected studies used human evaluation to assess the effectiveness of the CA designs. A direct evaluation of empathy of the CA was involved in only five studies using questionnaire responses, response ratings and in-depth interview responses. The human evaluation of CAs was performed mostly by end-users (75%), while experts in Mental Health (MH) assessed CAs in the remaining studies (25%). A variety of measures were used to evaluate the level of empathy exhibited by CAs. For example, Patient Health Questionnaires were used to show an improvement in the mean mood from 5.79 to 7.38 on a 10-point scale for one particular CA. In three other studies, empathy ratings were recorded with empathic percentages of 56%, 75% and 79%, with average empathy scores and emotional relevance scores of 2.85 or 3.05 out of 5 respectively in other studies.
Conclusions:CAs with good technical and empathic performance are now available to users of mental healthcare services.