Lecture Notes in Computer Science
DOI: 10.1007/978-3-540-71496-5_45
|View full text |Cite
|
Sign up to set email alerts
|

Search of Spoken Documents Retrieves Well Recognized Transcripts

Abstract: This paper presents a series of analyses and experiments on spoken document retrieval systems: search engines that retrieve transcripts produced by speech recognizers. Results show that transcripts that match queries well tend to be recognized more accurately than transcripts that match a query less well. This result was described in past literature, however, no study or explanation of the effect has been provided until now. This paper provides such an analysis showing a relationship between word error rate an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
6
0

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 4 publications
2
6
0
Order By: Relevance
“…While direct comparison of Tables 3 and 4 cannot be made since the segmentation boundaries are slightly different, a general trend can be seen that in the min case retrieval effectiveness is much lower for ASR transcripts, while results are similar for max queries. These results are consistent with the findings in [11] that search items with better WRR are likely to be retrieved at higher ranks.…”
Section: Resultssupporting
confidence: 91%
See 1 more Smart Citation
“…While direct comparison of Tables 3 and 4 cannot be made since the segmentation boundaries are slightly different, a general trend can be seen that in the min case retrieval effectiveness is much lower for ASR transcripts, while results are similar for max queries. These results are consistent with the findings in [11] that search items with better WRR are likely to be retrieved at higher ranks.…”
Section: Resultssupporting
confidence: 91%
“…A more interesting and careful examination of the differences in retrieval behaviour of documents with different speech transcription accuracy levels for the results of the TREC 7 SDR task is described in [11], [12]. The analysis of the distribution of the errors shows a general tendency for documents with low WERs to be retrieved at higher ranks, independent of document relevance to the search query.…”
Section: Recognition Word Error Rate and Retrievalmentioning
confidence: 99%
“…For spoken content items transcribed with low word error rates (WERs), there will be little impact in matching; items with higher WERs will be more significantly affected. We can then expect that items with lower WERs will appear at higher ranks in the results list, which was indeed observed by [237].…”
Section: Expansion Techniquessupporting
confidence: 54%
“…In [237], it is shown that ASR transcripts that provide good matches with queries tend to have lower WERs than ASR transcripts that match the query less well. Analysis showed that documents that contained a broader range of query words tended to have lower WERs.…”
Section: Interaction Of Asr Error and Irmentioning
confidence: 99%
“…Because topic classification algorithms that leverage broad patterns of term co-occurrence are available, this approach can yield more robust summaries that are less sensitive than snippets would be to variations in the word error rate. Word error rates in large speech collections typically vary systematically by speaker, so this might also help to minimize the natural bias that has been observed from term-based systems in favor of the clearest speakers [14]. On the other hand, implementing thesaurus-based search alone can make formulation of an initial query challenging for untrained users, and search topics that were not anticipated when the thesaurus was created can be particularly difficult to express.…”
Section: Introductionmentioning
confidence: 99%