Proceedings of the 12th Annual ACM International Conference on Multimedia 2004
DOI: 10.1145/1027527.1027613
|View full text |Cite
|
Sign up to set email alerts
|

The relative effectiveness of concept-based versus content-based video retrieval

Abstract: Three video search systems were compared in the interactive search task at the TRECVID 2003 workshop: a text-only system, which searched video shots through transcripts; a features-only system, which searched video shots through 16 video content features (e.g., airplanes and people); and a combined system, which searched through both transcripts and content features. 36 participants each completed 12 video search tasks. The hypothesis that the combined system would perform better than both the text-only and th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
32
0

Year Published

2005
2005
2014
2014

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(33 citation statements)
references
References 6 publications
1
32
0
Order By: Relevance
“…Automatic Speech Recognition (ASR) technology has been developed to turn audio into text (Christel et al, 1998) and to provide textual description of the video content. Even though the quality of the ASR transcript is usually not as good as the human generat ed video description, they are still the primary data resource for shot level video retrieval systems (Mezaris et al., 2005;Wildemuth et al, 2004;Amir et al, 2004;Heesch et al, 2004;Cooke et al, 2004).…”
Section: Related Researchmentioning
confidence: 99%
See 2 more Smart Citations
“…Automatic Speech Recognition (ASR) technology has been developed to turn audio into text (Christel et al, 1998) and to provide textual description of the video content. Even though the quality of the ASR transcript is usually not as good as the human generat ed video description, they are still the primary data resource for shot level video retrieval systems (Mezaris et al., 2005;Wildemuth et al, 2004;Amir et al, 2004;Heesch et al, 2004;Cooke et al, 2004).…”
Section: Related Researchmentioning
confidence: 99%
“…Automatic Speech Recognition (ASR) technology has been developed to turn audio into text (Christel et al, 1998) and to provide textual description of the video content. Even though the quality of the ASR transcript is usually not as good as the human generat ed video description, they are still the primary data resource for shot level video retrieval systems (Mezaris et al., 2005;Wildemuth et al, 2004;Amir et al, 2004;Heesch et al, 2004;Cooke et al, 2004).In video retrieval, various browsing technologies are widely supported to augment text based query search, in particular when exact queries are hard to form (Carmel et al 1992). This may be because human beings are good at rapidly fi nding patterns, recognizing objects, generalizing or inferring information fro m limited data, and making relevance decisions (Helander, 1998; Shneiderman, 1998) For shot level content-based retrieval (where a shot represents a series of consecutive frames with no sudden transition), temporal neighbor browsing is the most common navigation method (Heesch et al, 2004;Wildemuth et al, 2003).…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…Interactive search in particular can benefit from this knowledge, since the user plays such a central role in the process. Studies have been done to measure usability of interactive retrieval systems (e.g [4]) and effectiveness of different components of these systems ( [26]). In this paper we investigate the still unclear impact of user-behaviour and user-characteristics on the performance of interactive retrieval systems.…”
Section: Introductionmentioning
confidence: 99%
“…There has been less work done in conducting user studies with regard to assessing the effectiveness of the cross-media hypothesis. In [15], the authors report on TRECVID-2003 interactive search task by comparing three systems' performances (text only, feature only, combined). According to the findings, the system which combined both text and other modal features did not perform well as expected.…”
Section: Related Workmentioning
confidence: 99%