2003
DOI: 10.1007/3-540-45113-7_26
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Semantic Concepts from Video Using Temporal Gradients and Audio Classification

Abstract: Abstract. In this paper we describe new methods to detect semantic concepts from digital video based on audible and visual content. Temporal Gradient Correlogram captures temporal correlations of gradient edge directions from sampled shot frames. Power-related physical features are extracted from short audio samples in video shots. Video shots containing people, cityscape, landscape, speech or instrumental sound are detected with trained self-organized maps and kNN classification results of audio samples. Test… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2004
2004
2008
2008

Publication Types

Select...
5

Relationship

3
2

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 17 publications
0
9
0
Order By: Relevance
“…In [6] we observed that several concepts coexist and correlate in a video, which is not suitable for multi-class classifiers. Our approach is to have several simplified concept detectors that are trained using small sets of positive example shots, each propagating labels to their nearest neighbors in selected feature spaces.…”
Section: Detecting Semantic Conceptsmentioning
confidence: 96%
See 4 more Smart Citations
“…In [6] we observed that several concepts coexist and correlate in a video, which is not suitable for multi-class classifiers. Our approach is to have several simplified concept detectors that are trained using small sets of positive example shots, each propagating labels to their nearest neighbors in selected feature spaces.…”
Section: Detecting Semantic Conceptsmentioning
confidence: 96%
“…TGC, initially used in the detector experiments in [6], describes spatial correlation of edge orientations in an autocorrelogram. The feature is computed from the 20 temporally sampled video frames in a shot.…”
Section: Low-level Featuresmentioning
confidence: 99%
See 3 more Smart Citations