In this paper, we analyze the performance of name finding in the context of a variety of automatic speech recognition (ASR) systems and in the context of one optical character recognition (OCR) system. We explore the effects of word error rate from ASR and OCR, performance as a function of the amount of training data, and for speech, the effect of out-of-vocabulary errors and the loss of punctuation and mixed case I
There has been a long-standing methodology for evaluating work in speech recognition (SR), but until recently no community-wide methodology existed for either natural language (NL) researchers or speech understanding (SU) researchers for evaluating the systems they developed. Recently considerable progress has been made by a number of groups involved in the DARPA Spoken Language Systems (SLS) program to agree on a methodology for comparative evaluation of SLS systems, and that methodology is being used in practice for the first time. This paper gives an overview of the process that was followed in creating a meaningful evaluation mechanism, describes the current mechanism, and presents some directions for future development.
This paper proposes an automatic, essentially domainindependent means of evaluating Spoken Language Systems (SLS) which combines software we have developed for that purpose (the "Comparator") and a set of specifications for answer expressions (the "Common Answer Specification", or CAS). The Comparator checks whether the answer provided by a SLS accords with a canonical answer, returning either true or false. The Common Answer Specification determines the syntax of answer expressions, the minimal content that must be included in them, the data to be included in and excluded from test corpora, and the procedures used by the Comparator. Though some details of the CAS are particular to individual domains, the Comparator software is domain-independent, as is the CAS approach.
We describe HARC, a system for speech understanding that integrates speech recognition techniques with natural language processing. The integrated system uses statistical pattern recognition to build a lattice of potential words in the input speech. This word lattice is passed to a unification parser to derive all possible associated syntactic structures for these words. The resulting parse structures are passed to a multi-level semantics component for interpretation. 6 19a. NAME OF RESPONSIBLE PERSON
This paper reports recent progress on the development of the Delphi natural language component of the BBN spoken language system for the ATIS domain, focussing on the comparative evaluation performed by NIST in Jtme, 1990.
This paper describes Project HOOKAH, a TIPSTER Implementation Project with the Drug Enforcement Administration to extract information from the DFFA-6 field report. The paper overviews Project HOOKAH, describes the system architecture and modules, and discusses several lessons that have been learned from this application of TIPSTER technology. PROJECT HOOKAH OverviewProject HOOKAH is a TIPSTER Implementation Project with the Drug Enforcement Administration to extract information from DEA field reports in support of populating a database. Its goal is the partial automation of DEA operations by moving information extraction technology into the DEA fileroom, where these reports are currently manually processed.HOOKAH has been supported by Congressional "Dual Use" funding for transferring TIPSTER technology to civilian agencies. The prototype development effort has been managed by Mary Ellen Okurowski and Boyan Onyshkevych of the Department of Defense. The deployment effort is being jointly managed by DoD and DEA, with DEA responsible for life cycle maintenance. Domain: DEA-6sThe focus of Project HOOKAH is to improve the processing of the DEA-6 report, a semi-formatted report generated primarily by field agents, as well as legal staff, analysts, and others. DEA-6s are organized into case files, and are composed of multiple sections with varying amounts of formatting. Header fields are normally highly formatted, and indicate the subject, case, date, time, etc. There is a semi-formatted index, which contains references to most subjects to be to the database and some information about them. There is also unformatted text, where much of the useful information is found.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.