The final stage in the IBM DeepQA pipeline involves ranking all candidate answers according to their evidence scores and judging the likelihood that each candidate answer is correct. In DeepQA, this is done using a machine learning framework that is phase-based, providing capabilities for manipulating the data and applying machine learning in successive applications. We show how this design can be used to implement solutions to particular challenges that arise in applying machine learning for evidence-based hypothesis evaluation. Our approach facilitates an agile development environment for DeepQA; evidence scoring strategies can be easily introduced, revised, and reconfigured without the need for error-prone manual effort to determine how to combine the various evidence scores. We describe the framework, explain the challenges, and evaluate the gain over a baseline machine learning approach.
Although the majority of evidence analysis in DeepQA is focused on unstructured information (e.g., natural-language documents), several components in the DeepQA system use structured data (e.g., databases, knowledge bases, and ontologies) to generate potential candidate answers or find additional evidence. Structured data analytics are a natural complement to unstructured methods in that they typically cover a narrower range of questions but are more precise within that range. Moreover, structured data that has formal semantics is amenable to logical reasoning techniques that can be used to provide implicit evidence. The DeepQA system does not contain a single monolithic structured data module; instead, it allows for different components to use and integrate structured and semistructured data, with varying degrees of expressivity and formal specificity. This paper is a survey of DeepQA components that use structured data. Areas in which evidence from structured sources has the most impact include typing of answers, application of geospatial and temporal constraints, and the use of formally encoded a priori knowledge of commonly appearing entity types such as countries and U.S. presidents. We present details of appropriate components and demonstrate their end-to-end impact on the IBM Watsoni system.
Real-time transcription has been shown to be valuable in facilitating non-native speakers' comprehension in realtime communication. Automated speech recognition (ASR) technology is a critical ingredient for its practical deployment. This paper presents a series of studies investigating how the quality of transcripts generated by an ASR system impacts user comprehension and subjective evaluation. Experiments are first presented comparing performance across three different transcription conditions: no transcript, a perfect transcript, and a transcript with Word Error Rate (WER) =20%. We found 20% WER was the most likely critical point for transcripts to be just acceptable and useful. Then we further examined a lower WER of 10% (a lower bound for today's state-of-the-art systems) employing the same experimental design. The results indicated that at 10% WER comprehension performance was significantly improved compared to the no-transcript condition. Finally, implications for further system development and design are discussed.
We performed an empirical study to understand the relative contributions of real-time transcription to a non-native speaker's comprehension in audio/video meetings. 48 participants were assigned to 2 presentation modes (audio, audio+video) and 3 transcription modes (no transcript, realtime transcripts in the streaming mode, transcripts with all past records) in a 3x2 factorial experimental design. The results suggest that comprehension can be significantly improved for both audio and audio+video conditions when real-time transcription is provided. Also, the participants reported positive subjective responses to the presence of real-time transcription in terms of usefulness, preference, and willingness to use such a feature if provided. No cognitive load issues were reported by the participants in the ability to synthesize across modalities. Implications for system development and design, as well as future work utilizing automation speech recognition to provide the transcripts are discussed.
Real-time transcription generated by automated speech recognition (ASR) technologies with a reasonably high accuracy has been demonstrated to be valuable in facilitating non-native speakers' comprehension in real-time communication. Besides errors, time delay often exists due to technical problems in automated transcription as well. This study focuses on how the time delay of transcription impacts non-native speakers' comprehension performance and user experience. The experiment design simulated a one-way computermediated communication scenario, where comprehension performance and user experiences in 3 transcription conditions (no transcript; perfect transcripts with a 2-second delay; and transcripts with a 10% word-error-rate and a 2-second delay) were compared. The results showed that the participants can benefit from the transcription with a 2-second time delay, as their comprehension performance in this condition was improved compared with the no-transcript condition. However, the transcription presented with delay was found to have negative effects on user experience. In the final part of the paper, implications for further system development and design are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.