2009
DOI: 10.1109/tmm.2009.2017619
|View full text |Cite
|
Sign up to set email alerts
|

Episode-Constrained Cross-Validation in Video Concept Retrieval

Abstract: Whereas video tells a narrative by a composition of shots, current video retrieval methods focus mainly on single shots. In retrieval performance estimation, similar shots in a narrative may result in performance overestimation. We propose an episode-based version of cross-validation leading up to 14% classification improvement over shot-based cross-validation.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0
1

Year Published

2010
2010
2017
2017

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 22 publications
(23 reference statements)
0
7
0
1
Order By: Relevance
“…Current action recognition methods determine which action occurs in a video with good accuracy [9,13,23,30,32]. The task of localization is more demanding as it also requires specifying the location where the action happens in the video.…”
Section: Related Workmentioning
confidence: 99%
“…Current action recognition methods determine which action occurs in a video with good accuracy [9,13,23,30,32]. The task of localization is more demanding as it also requires specifying the location where the action happens in the video.…”
Section: Related Workmentioning
confidence: 99%
“…During training, 10 samples are retrieved from random locations for each frame of each training video, yielding roughly 750.000 samples to be trained by the Decision Forest. The main parameters of the Forest -the randomness and the number of trees -are set through validation [40]. For a test video, samples are extracted every 11 th pixel in width and height for each frame, followed by individual classification.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…Next to convolutional networks, other competitive object detection methods are based on the bag-of-words (BOW) model [31,33,34] or its Fisher vector incarnation [6,28]. Such methods start with a limited set of object-proposals to reduce the search space.…”
Section: Automatic Object Detectionmentioning
confidence: 99%