How and where biometric systems are deployed will depend on their performance. Knowing what to ask and how to decipher the answers can help you evaluate the performance of these emerging technologies.
This paper reports results obtained in benchmark tests conducted within the ARPA Spoken Language program in November and December of 1993. In addition to ARPA contractors, participants included a number of %olunteers", including foreign participants from Canada, France, Germany, and the United Kingdom. The body of the paper is limited to an outline of the structure of the tests and presents highlights and discussion of selected results. Detailed tabulations of reported "official" results, and additional explanatory text appears in the Appendix. 2. WSJ-CSR TESTS 2.1. New Conditions All sites participating in the WSJ-CSR tests were required to submit results for (at least) one of two "Hub" tests. The Hub tests were intended to measure basic speaker-independent performance on either a 64K-word (Hub 1) or 5K-word (Hub 2) read-speech test set, and included required use of either a "standard" 20K trigram (Hub 1) or 5K bigram (Hub 2) grammar, and also required use of standard training sets. These requirements were intended to facilitate meaningful cross-site comparisons. The "Spoke" tests were intended to support a number of different ehaUenges. Spokes 1, 3 and 4 supported problems in various types of adaptation: incremental supervised language model adaptation (Spoke 1), rapid enrollment speaker adaptation for "recognition outliers" (i.e., non-native speakers) (Spoke 3), incremental speaker adaptation (Spoke 4). [There were no participants in what had been planned as Spoke 2.] Spokes 5 through 8 supported problems in noise and channel compensation: unsupervised channel compensation (Spoke 5), "known microphone" adaptation for two different microphones (Spoke 6), unsupervised channel compensation for 2 different environments (Spoke 7), and use of a noise compensation algorithm with a known alternate microphone for data collected in environments when there is competing "calibrated" noise (radio talk shows or music) (Spoke 8). Spoke 9 included spontaneous "dictation-style" speech. Additional details are found in Kubala, et al. [1], on behalf of members of the ARPA Continuous speech recognition Corpus Coordinating Committee (CCCC). 2.2. WSJ-CSR Summary Highlights The design of the "Hub and Spoke" test paradigm, was such that opportunities abounded for informative contrasts (e.g., the use of bigram vs. trigram grammars, the enablement/disablement of supervised vs. unsupervised adaptation strategies, ete). There were nine participating sites in the Hub I tests and five sites participating in the Hub 2 tests, and some sites reported results for more than one system or research team. The lowest word error rate in the Hub 1 baseline condition was achieved by the French CNRS-LIMSI group [2,3]. Application of statistical significance tests indicated that the performance differences between this system and a system
We introduce four principles for explainable artificial intelligence (AI) that comprise fundamental properties for explainable AI systems. We propose that explainable AI systems deliver accompanying evidence or reasons for outcomes and processes; provide explanations that are understandable to individual users; provide explanations that correctly reflect the system's process for generating the output; and that a system only operates under conditions for which it was designed and when it reaches sufficient confidence in its output. We have termed these four principles as explanation, meaningful, explanation accuracy, and knowledge limits, respectively. Through significant stakeholder engagement, these four principles were developed to encompass the multidisciplinary nature of explainable AI, including the fields of computer science, engineering, and psychology. Because one-sizefits-all explanations do not exist, different users will require different types of explanations. We present five categories of explanation and summarize theories of explainable AI. We give an overview of the algorithms in the field that cover the major classes of explainable algorithms. As a baseline comparison, we assess how well explanations provided by people follow our four principles. This assessment provides insights to the challenges of designing explainable AI systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.