2004
DOI: 10.1007/978-3-540-30222-3_61
|View full text |Cite
|
Sign up to set email alerts
|

The CLEF 2003 Cross-Language Spoken Document Retrieval Track

Abstract: Abstract. This paper summarizes the Cross-Language Spoken Document Retrieval (CL-SDR) track held at CLEF 2004. The CL-SDR task at CLEF 2004 was again based on the TREC-8 and TREC-9 SDR tasks. This year the CL-SDR task was extended to explore the unknown story boundaries condition introduced at TREC. The paper reports results from the participants showing that as expected cross-language results are reduced relative to a monolingual baseline, although the amount to which they are degraded varies for different to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2005
2005
2024
2024

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 17 publications
(7 citation statements)
references
References 6 publications
0
6
0
Order By: Relevance
“…In this paper we explore the use of Query Expansion (QE) methods for IR for user-generated content where the information is primarily in the spoken data stream, for which search relies on Spoken Content Retrieval (SCR) techniques. Research on SCR initially investigated IR for planned speech content such as news broadcasts and documentaries [1], [2]. The focus then shifted towards spoken content that is produced spontaneously such as interviews, lectures and TV shows [3].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In this paper we explore the use of Query Expansion (QE) methods for IR for user-generated content where the information is primarily in the spoken data stream, for which search relies on Spoken Content Retrieval (SCR) techniques. Research on SCR initially investigated IR for planned speech content such as news broadcasts and documentaries [1], [2]. The focus then shifted towards spoken content that is produced spontaneously such as interviews, lectures and TV shows [3].…”
Section: Introductionmentioning
confidence: 99%
“…Our work focuses on the blip10000 Internet video archive [8]. These videos were uploaded to the social video sharing site blip.tv by 2,237 different uploaders and covering a 25 different topics 1 with varying recording quality and differing lengths. The statistics of Automatic Speech Recognition (ASR) transcripts extracted from these videos are shown in Table I.…”
Section: Introductionmentioning
confidence: 99%
“…Earlier speech corpora contained relatively clean audio, often with a single speaker reading from a prepared text, such as the TIMIT collection (Garofolo et al, 1990) or broadcast news corpora, which have been used as data sets for speech retrieval experiments in both TREC (Garofolo et al, 2000) and CLEF (Federico and Jones, 2003), and for Topic Detection and Tracking (Allan et al, 1998). These more formal settings or samples of formal content are useful for the study of acoustic qualities of human speech, but represent a more idealized scenario than practical audio processing tasks of interest today.…”
Section: Related Datasetsmentioning
confidence: 99%
“…These CLIR tasks were done using topics in several European languages. No metadata was provided in these tasks, but some interesting findings indicate that even with the manually translated queries, the best CLIR performance resulted in 15% reduction from the monolingual ones (Federico & Jones, 2004), while using dictionary term-by-term translation, this reduction increased to between about 40% and 60%, which highlights the challenge for CLIR over video collections (Federico et al, 2005).…”
Section: Related Workmentioning
confidence: 99%
“…From 2002-2004 the Cross-Language Spoken Document Retrieval (CL-SDR) task investigated news story document retrieval using data from the NIST TREC 8-9 Spoken Document Retrieval (SR) with manually translated queries (Federico & Jones, 2004;Federico, Bertoldi, Levow, & Jones, 2005). The aim of these tasks was to evaluate CLIR systems on noisy automatic transcripts of spoken documents with known story boundaries which involved the retrieval of American English news broadcasts of both unsegmented and segmented transcripts taken from radio and TV news.…”
Section: Related Workmentioning
confidence: 99%