2010
DOI: 10.1109/tmm.2010.2066960
|View full text |Cite
|
Sign up to set email alerts
|

Sequence Multi-Labeling: A Unified Video Annotation Scheme With Spatial and Temporal Context

Abstract: Abstract-Automatic video annotation is a challenging yet important problem for content-based video indexing and retrieval. In most existing works, annotation is formulated as a multi-labeling problem over individual shots. However, video is by nature informative in spatial and temporal context of semantic concepts. In this paper, we formulate video annotation as a sequence multi-labeling (SML) problem over a shot sequence. Different from many video annotation paradigms working on individual shots, SML aims to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2011
2011
2018
2018

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(8 citation statements)
references
References 41 publications
(83 reference statements)
0
8
0
Order By: Relevance
“…Their idea assumes that most a video v i labeled t i includes a sequence of a video v j labeled t j , the likelihood that v i will be labeled t j increases. For videos annotation, Yuanning et al [13] propose to consider the spatial and temporal contexts using a kernel function that takes into account the temporal correlation and the spatial correlation between concepts. For example, we may find a rule of the type: when a concept c i is present in three successive shots, the concepts c j and c k co-occur in the last two successive shots.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Their idea assumes that most a video v i labeled t i includes a sequence of a video v j labeled t j , the likelihood that v i will be labeled t j increases. For videos annotation, Yuanning et al [13] propose to consider the spatial and temporal contexts using a kernel function that takes into account the temporal correlation and the spatial correlation between concepts. For example, we may find a rule of the type: when a concept c i is present in three successive shots, the concepts c j and c k co-occur in the last two successive shots.…”
Section: Related Workmentioning
confidence: 99%
“…Some researchers tried to deal with this problematic. Indeed, several categories of context were considered: semantic [2], [3], [4], spatial [5], scale [6], temporal [7], [8], [9], [10], [11], [12], [13]. Videos have a characteristic that differentiates them from still images: the temporal aspect.…”
Section: Introductionmentioning
confidence: 99%
“…Their extensive work can also be found in [12], in which they verified the effectiveness of tag ranking by extensive experiments. Instead of recommending tags to each individual video shot, Li et al [5] model video annotation as a sequence multi-labeling problem. They jointly consider spatial and temporal context in consecutive video shots and infer the best labeling sequence in a global optimization manner.…”
Section: Video Taggingmentioning
confidence: 99%
“…As concept detectors flourish in recent years, many researchers tackle with this issue by detecting concepts in video frames, with main consideration on visual features. For example, Li et al [5] jointly consider spatial correlation, temporal consistency, and temporal dependency of audiovisual features, and formulate video annotation as a sequence multi-labeling problem. Features directly extracted from video content are used to construct classifiers in this kind of work.…”
Section: Introductionmentioning
confidence: 99%
“…In general terms, state transition-based classification usually requires a clear definition of the individual states, the most difficult part for developers; otherwise, the precision rate may decrease. In addition to the previously mentioned approaches, there have also been research efforts [Tang et al 2008;Li et al 2010] that made use of other classifiers, such as support vector machines (SVMs) [Li et al 2010] and neural networks (NNs) [Tang et al 2008], in order to detect objects of interest. However, these types of classifiers require an additional mechanism to analyze time-series features in order to recognize motion events.…”
Section: Introductionmentioning
confidence: 99%