Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181
DOI: 10.1109/icassp.1998.679707
|View full text |Cite
|
Sign up to set email alerts
|

Retrieval of broadcast news documents with the THISL system

Abstract: This paper describes a spoken document retrieval system, combining the ABBOT large vocabulary continuous speech recognition (LVCSR) system developed by Cambridge University, Sheffield University and Softsound, and the PRISE information retrieval engine developed by NIST. The system was constructed to enable us to participate in the TREC 6 Spoken Document Retrieval experimental evaluation. Our key aims in this work were to produce a complete system for the SDR task, to investigate the effect of a word error rat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 26 publications
(31 citation statements)
references
References 10 publications
0
31
0
Order By: Relevance
“…A form of pseudo relevance feedback (known as local context analysis) was used to expand topic texts with additional terms taken from the recognized transcript collection. 6. shef-s1, University of Sheffield with collaborators at Cambridge University (Abberley et al, 1998). Recognition was performed using the Abbot recognizer system with a vocabulary of 65,532 words producing a transcript with a 35.9% WER.…”
Section: Experiments On the Extent Of The Effect Of Wer And Rank Posimentioning
confidence: 99%
“…A form of pseudo relevance feedback (known as local context analysis) was used to expand topic texts with additional terms taken from the recognized transcript collection. 6. shef-s1, University of Sheffield with collaborators at Cambridge University (Abberley et al, 1998). Recognition was performed using the Abbot recognizer system with a vocabulary of 65,532 words producing a transcript with a 35.9% WER.…”
Section: Experiments On the Extent Of The Effect Of Wer And Rank Posimentioning
confidence: 99%
“…This formulation required that sites perform some automatic segmentation of the full broadcasts into smaller units suitable for retrieval. Using an approach inspired by [6], we performed story segmentation as follows. First we created 30 second segments based on the word recognition time stamps using a 10 second step to create overlapping segment windows.…”
Section: Cl-sdr System By University Of Chicagomentioning
confidence: 99%
“…Although a great deal of research has been conducted in the retrieval of corrupted data, be it scanned text [Jones01], speech ([Garofolo99b], [Abberley98]), or translated foreign language documents [Franz98], relatively little work has investigated the notion of varying levels of corruption in the collection(s) being retrieved. This is perhaps surprising, as it is quite reasonable to expect such variation.…”
Section: Past Workmentioning
confidence: 99%
“…Spoken document retrieval (SDR) is being used: internally within corporations ([Abberley98], [Renals99], [THISL01]), and as a Web search engine ([SpeechB00A], [SpeechB00B], [SpeechB01]). Word error rates (WER) on the audio documents being retrieved are still relatively high, however, because relevant documents generally contain each query term with a high term frequency (tf), as long as a few of the term occurrences are recognised, relevant documents will be retrieved.…”
Section: Introductionmentioning
confidence: 99%