Proceedings of the Sixth Conference on Applied Natural Language Processing - 2000
DOI: 10.3115/974147.974191
|View full text |Cite
|
Sign up to set email alerts
|

Named entity extraction from noisy input

Abstract: In this paper, we analyze the performance of name finding in the context of a variety of automatic speech recognition (ASR) systems and in the context of one optical character recognition (OCR) system. We explore the effects of word error rate from ASR and OCR, performance as a function of the amount of training data, and for speech, the effect of out-of-vocabulary errors and the loss of punctuation and mixed case I

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

4
54
0

Year Published

2003
2003
2021
2021

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 73 publications
(59 citation statements)
references
References 4 publications
4
54
0
Order By: Relevance
“…He applied a NER system on transcriptions of broadcast news, and reported that its performance degraded linearly with the word error rate of speech recognition (e.g., missing data, misspelled data and spuriously tagged names). Named entity recognition in speech data has been investigated further, but this related work has focused on either decreasing the error rate when transcribing speech [15,20], on considering different speech transcription hypotheses [11,3], or on the issue of temporal mismatch between training and test data [8]. None of these articles consider exploiting external text sources to improve NER in speech data nor the problem of recovering missing named entities in transcribed speech.…”
Section: Prior Workmentioning
confidence: 99%
See 1 more Smart Citation
“…He applied a NER system on transcriptions of broadcast news, and reported that its performance degraded linearly with the word error rate of speech recognition (e.g., missing data, misspelled data and spuriously tagged names). Named entity recognition in speech data has been investigated further, but this related work has focused on either decreasing the error rate when transcribing speech [15,20], on considering different speech transcription hypotheses [11,3], or on the issue of temporal mismatch between training and test data [8]. None of these articles consider exploiting external text sources to improve NER in speech data nor the problem of recovering missing named entities in transcribed speech.…”
Section: Prior Workmentioning
confidence: 99%
“…For instance, the Stanford NER system in the CoNLL 2003 shared task on NER in written data report an F 1 value of 87.94% [23]. [13,15] report a degrade of NER performance between 20-25% in F 1 value when applying a NER trained on written data to transcribed speech.…”
Section: Introductionmentioning
confidence: 99%
“…The extraction of named entities from speech has been used with large vocabulary ASR, most notably Broadcast News, associated with the DARPA HUB-4 task [4,39,27], as well as with similar corpora in Chinese [64] or French [20]. Although the speech in these corpora is not, for the most part, spontaneous, the extraction of proper names, locations, and organizations represents a significant advancement in the processing of this type of data.…”
Section: Extracting Meaning From Speechmentioning
confidence: 99%
“…There is previous research connecting OCR with information extraction, including [16] and [11] who demonstrate that the quality of information extraction is reduced in the presence of OCR errors. Work involving the extraction of named entities from OCR output include [12,8].…”
Section: Introductionmentioning
confidence: 99%