Explainability is essential for users to effectively understand, trust, and manage powerful artificial intelligence applications.
How can end users efficiently influence the predictions that machine learning systems make on their behalf? This paper presents Explanatory Debugging, an approach in which the system explains to users how it made each of its predictions, and the user then explains any necessary corrections back to the learning system. We present the principles underlying this approach and a prototype instantiating it. An empirical evaluation shows that Explanatory Debugging increased participants' understanding of the learning system by 52% and allowed participants to correct its mistakes up to twice as efficiently as participants using a traditional learning system.
Abstract-Clinical decision support systems (CDSS) are increasingly used by healthcare professionals for evidence-based diagnosis and treatment support. However, research has suggested that users often over-rely on system suggestions -even if the suggestions are wrong. Providing explanations could potentially mitigate misplaced trust in the system and overreliance. In this paper, we explore how explanations are related to user trust and reliance, as well as what information users would find helpful to better understand the reliability of a system's decision-making. We investigated these questions through an exploratory user study in which healthcare professionals were observed using a CDSS prototype to diagnose hypothetic cases using fictional patients suffering from a balancerelated disorder. Our results show that the amount of system confidence had only a slight effect on trust and reliance. More importantly, giving a fuller explanation of the facts used in making a diagnosis had a positive effect on trust but also led to over-reliance issues, whereas less detailed explanations made participants question the system's reliability and led to selfreliance problems. To help them in their assessment of the reliability of the system's decisions, study participants wanted better explanations to help them interpret the system's confidence, to verify that the disorder fit the suggestion, to better understand the reasoning chain of the decision model, and to make differential diagnoses. Our work is a first step toward improved CDSS design that better supports clinicians in making correct diagnoses.
Abstract-Research is emerging on how end users can correct mistakes their intelligent agents make, but before users can correctly "debug" an intelligent agent, they need some degree of understanding of how it works. In this paper we consider ways intelligent agents should explain themselves to end users, especially focusing on how the soundness and completeness of the explanations impacts the fidelity of end users' mental models. Our findings suggest that completeness is more important than soundness: increasing completeness via certain information types helped participants' mental models and, surprisingly, their perception of the cost/benefit tradeoff of attending to the explanations. We also found that oversimplification, as per many commercial agents, can be a problem: when soundness was very low, participants experienced more mental demand and lost trust in the explanations, thereby reducing the likelihood that users will pay attention to such explanations at all.
Although machine learning is becoming commonly used in today's software, there has been little research into how end users might interact with machine learning systems, beyond communicating simple "right/wrong" judgments. If the users themselves could work hand-in-hand with machine learning systems, the users' understanding and trust of the system could improve and the accuracy of learning systems could be improved as well. We conducted three experiments to understand the potential for rich interactions between users and machine learning systems. The first experiment was a think-aloud study that investigated users' willingness to interact with machine learning reasoning, and what kinds of feedback users might give to machine learning systems. We then investigated the viability of introducing such feedback into machine learning systems, specifically, how to incorporate some of these types of user feedback into machine learning systems, and what their impact was on the accuracy of the system. Taken together, the results of our experiments show that supporting rich interactions between users and machine learning systems is feasible for both user and machine. This shows the potential of rich humancomputer collaboration via on-the-spot interactions as a promising direction for machine learning systems and users to collaboratively share intelligence.
their un-debuggability, and their inability to "explain" their decisions in a human understandable and reconstructable way. So while AlphaGo or DeepStack can crush the best humans at Go or Poker, neither program has any internal model of its task; its representations defy interpretation by humans, there is no mechanism to explain their actions and behaviour, and furthermore, there is no obvious instructional value. .. the high performance systems can not help humans improve. Even when we understand the underlying mathematical scaffolding of current machine learning architectures, it is often impossible to get insight into the internal working of the models; we need explicit modeling and reasoning tools to explain how and why a result was achieved. We also know that a significant challenge for future AI is contextual adaptation, i.e., systems that incrementally help to construct explanatory models for solving real-world problems. Here it would be beneficial not to exclude human expertise, but to augment human intelligence with artificial intelligence.
In recent years, research into gender differences has established that individual differences in how people problem-solve often cluster by gender. Research also shows that these differences have direct implications for software that aims to support users' problem-solving activities, and that much of this software is more supportive of problem-solving processes favored (statistically) more by males than by females. However, there is almost no work considering how software practitionerssuch as User Experience (UX) professionals or software developers-can find gender-inclusiveness issues like these in their software. To address this gap, we devised the GenderMag method for evaluating problem-solving software from a genderinclusiveness perspective. The method includes a set of faceted personas that bring five facets of gender difference research to life, and embeds use of the personas into a concrete process through a gender-specialized Cognitive Walkthrough. Our empirical results show that a variety of practitioners who design software-without needing any background in gender research-were able to use the GenderMag method to find gender-inclusiveness issues in problem-solving software. Our results also show that the issues the practitioners found were real and fixable. This work is the first systematic method to find gender-inclusiveness issues in software, so that practitioners can design and produce problem-solving software that is more usable by everyone. Categories and Subject DescriptorsH.5.2. Information interfaces and presentation (e.g., HCI): User Interfaces; H.5.m. Information interfaces and presentation (e.g., HCI): Miscellaneous. Additional KeywordsGender; gender HCI; diversity; problem-solving software; GenderMag Research Highlights We discuss five facets of prior gender research with ties to males' and females' usage of problem-solving software. We present GenderMag, the first systematic method to evaluate gender-inclusiveness issues in problem-solving software. We show how GenderMag draws upon and encapsulates these five facets. We present three qualitative empirical studies that were used to inform and to validate various aspects of GenderMag, and show the kinds of issues that participants found and how gender of the evaluator interacted with usage of the method.
Many machine-learning algorithms learn rules of behavior from individual end users, such as taskoriented desktop organizers and handwriting recognizers. These rules form a "program" that tells the computer what to do when future inputs arrive. Little research has explored how an end user can debug these programs when they make mistakes. We present our progress toward enabling end users to debug these learned programs via a Natural Programming methodology. We began with a formative study exploring how users reason about and correct a text-classification program. From the results, we derived and prototyped a concept based on "explanatory debugging", then empirically evaluated it. Our results contribute methods for exposing a learned program's logic to end users and for eliciting user corrections to improve the program's predictions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.