This study describes experiments on automatic detection of semantic concepts, which are textual descriptions about the digital video content. The concepts can be further used in content-based categorization and access of digital video repositories. Temporal Gradient Correlograms, Temporal Color Correlograms and Motion Activity low-level features are extracted from the dynamic visual content of a video shot. Semantic concepts are detected with an expeditious method that is based on the selection of small positive example sets and computational low-level feature similarities between video shots. Detectors using several feature and fusion operator configurations are tested in 60-hour news video database from TRECVID 2003 benchmark. Results show that the feature fusion based on ranked lists gives better detection performance than fusion of normalized low-level feature spaces distances. Best performance was obtained by pre-validating the configurations of features and rank fusion operators. Results also show that minimum rank fusion of temporal color and structure provides comparable performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.