2009
DOI: 10.2197/ipsjjip.17.82
|View full text |Cite
|
Sign up to set email alerts
|

Construction of a Test Collection for Spoken Document Retrieval from Lecture Audio Data

Abstract: The lecture is one of the most valuable genres of audiovisual data. Though spoken document processing is a promising technology for utilizing the lecture in various ways, it is difficult to evaluate because the evaluation require a subjective judgment and/or the verification of large quantities of evaluation data. In this paper, a test collection for the evaluation of spoken lecture retrieval is reported. The test collection consists of the target spoken documents of about 2,700 lectures (604 hours) taken from… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2009
2009
2016
2016

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 10 publications
(2 citation statements)
references
References 8 publications
0
2
0
Order By: Relevance
“…On the other hand, the Spoken Document Processing Working Group, which is part of the special interest group of spoken language processing (SIG-SLP) of the Information Processing Society of Japan, has already developed prototypes of SDR test collections: the Corpus of Spontaneous Japanese (CSJ) Spoken Term Detection test collection [2] and CSJ Spoken Document Retrieval test collection [3]. The target documents of both test collections are spoken lectures in CSJ [4].…”
Section: Introductionmentioning
confidence: 99%
“…On the other hand, the Spoken Document Processing Working Group, which is part of the special interest group of spoken language processing (SIG-SLP) of the Information Processing Society of Japan, has already developed prototypes of SDR test collections: the Corpus of Spontaneous Japanese (CSJ) Spoken Term Detection test collection [2] and CSJ Spoken Document Retrieval test collection [3]. The target documents of both test collections are spoken lectures in CSJ [4].…”
Section: Introductionmentioning
confidence: 99%
“…Using characters as features is possible in Japanese owing to the presence of kanji, ideograms originating from Chinese characters that represent not only sounds but also meanings. The use of words or characters has also been investigated for spoken document retrieval [16], [17], and better performance was obtained when using words than when using characters. However, the spoken inquiries in our topic classification task are much shorter than spoken documents; hence we are also interested in evaluating spoken inquiries.…”
Section: Introductionmentioning
confidence: 99%