Conversational agents (CAs)
are increasingly ubiquitous and are now commonly used to access medical information. However, we lack systematic data about the quality of advice such agents provide. This paper evaluates CA advice for
mental health (MH)
questions, a pressing issue given that we are undergoing a mental health crisis. Building on prior work, we define a new method to systematically evaluate mental health responses from CAs. We develop multi-utterance conversational probes derived from two widely used mental health diagnostic surveys, the PHQ-9 (Depression) and the GAD-7 (Anxiety). We evaluate the responses of two text-based chatbots and four voice assistants to determine whether CAs provide relevant responses and treatments. Evaluations were conducted both by clinicians and immersively by trained raters, yielding consistent results across all raters. Although advice and recommendations were generally low quality, they were better for Crisis probes and for probes concerning symptoms of Anxiety rather than Depression. Responses were slightly improved for text versus speech-based agents, and when CAs had access to extended dialogue context. Design implications include suggestions for improved responses through clarification sub-dialogues. Responses may also be improved by the incorporation of empathy although this needs to be combined with effective treatments or advice.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.