Retrieval of broadcast news documents with the THISL system

Abberley, Dave; Renals, Steve; Cook, Gary

doi:10.1109/icassp.1998.679707

Cited by 26 publications

(31 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A form of pseudo relevance feedback (known as local context analysis) was used to expand topic texts with additional terms taken from the recognized transcript collection. 6. shef-s1, University of Sheffield with collaborators at Cambridge University (Abberley et al, 1998). Recognition was performed using the Abbot recognizer system with a vocabulary of 65,532 words producing a transcript with a 35.9% WER.…”

Section: Experiments On the Extent Of The Effect Of Wer And Rank Posimentioning

confidence: 99%

Advances in Information Retrieval

Amati¹,

Carpineto²,

Romano³

2007

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. This paper presents a series of analyses and experiments on spoken document retrieval systems: search engines that retrieve transcripts produced by speech recognizers. Results show that transcripts that match queries well tend to be recognized more accurately than transcripts that match a query less well. This result was described in past literature, however, no study or explanation of the effect has been provided until now. This paper provides such an analysis showing a relationship between word error rate and query length. The paper expands on past research by increasing the number of recognitions systems that are tested as well as showing the effect in an operational speech retrieval system. Potential future lines of enquiry are also described.

show abstract

Section: Experiments On the Extent Of The Effect Of Wer And Rank Posimentioning

confidence: 99%

Advances in Information Retrieval

Amati¹,

Carpineto²,

Romano³

2007

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…This formulation required that sites perform some automatic segmentation of the full broadcasts into smaller units suitable for retrieval. Using an approach inspired by [6], we performed story segmentation as follows. First we created 30 second segments based on the word recognition time stamps using a 10 second step to create overlapping segment windows.…”

Section: Cl-sdr System By University Of Chicagomentioning

confidence: 99%

The CLEF 2003 Cross-Language Spoken Document Retrieval Track

Federico¹,

Jones

2004

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. This paper summarizes the Cross-Language Spoken Document Retrieval (CL-SDR) track held at CLEF 2004. The CL-SDR task at CLEF 2004 was again based on the TREC-8 and TREC-9 SDR tasks. This year the CL-SDR task was extended to explore the unknown story boundaries condition introduced at TREC. The paper reports results from the participants showing that as expected cross-language results are reduced relative to a monolingual baseline, although the amount to which they are degraded varies for different topic languages.

show abstract

“…Although a great deal of research has been conducted in the retrieval of corrupted data, be it scanned text [Jones01], speech ([Garofolo99b], [Abberley98]), or translated foreign language documents [Franz98], relatively little work has investigated the notion of varying levels of corruption in the collection(s) being retrieved. This is perhaps surprising, as it is quite reasonable to expect such variation.…”

Section: Past Workmentioning

confidence: 99%

“…Spoken document retrieval (SDR) is being used: internally within corporations ([Abberley98], [Renals99], [THISL01]), and as a Web search engine ([SpeechB00A], [SpeechB00B], [SpeechB01]). Word error rates (WER) on the audio documents being retrieved are still relatively high, however, because relevant documents generally contain each query term with a high term frequency (tf), as long as a few of the term occurrences are recognised, relevant documents will be retrieved.…”

Section: Introductionmentioning

confidence: 99%

Speech and Hand Transcribed Retrieval

Sanderson

Shou

2002

Information Retrieval Techniques for Speech Applications

View full text Add to dashboard Cite

This paper describes the issues and preliminary work involved in the creation of an information retrieval system that will manage the retrieval from collections composed of both speech recognised and ordinary text documents. In previous work, it has been shown that because of recognition errors, ordinary documents are generally retrieved in preference to recognised ones. Means of correcting or eliminating the observed bias is the subject of this paper. Initial ideas and some preliminary results are presented. General TermsMeasurement, Experimentation.

show abstract

Retrieval of broadcast news documents with the THISL system

Cited by 26 publications

References 10 publications

Advances in Information Retrieval

Advances in Information Retrieval

The CLEF 2003 Cross-Language Spoken Document Retrieval Track

Speech and Hand Transcribed Retrieval

Contact Info

Product

Resources

About