Video event detection using motion relativity and visual relatedness

Wang, Feng; Jiang, Yu‐Gang; Ngo, Chong-Wah

doi:10.1145/1459359.1459392

Cited by 80 publications

(49 citation statements)

References 31 publications

(39 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, computation of trajectory descriptors requires substantial computational overhead. The first of its kind was proposed by Wang et al [149] where the authors used the wellknown Kanade-Lucas-Tomasi (KLT) tracker [79] to extract DoG-SIFT key-point trajectories, and compute a feature by modeling the motion between every trajectory pair. Sun et al [132] also applied KLT to track DoG-SIFT key-points.…”

Section: Trajectory Descriptorsmentioning

confidence: 99%

“…Different from [149], they computed three levels of trajectory context, including point-level context which is an averaged SIFT descriptor, intra-trajectory context which models trajectory transitions over time, and inter-trajectory context which encodes proximities between trajectories. The velocity histories of key-point trajectories are modeled by Messing et al [87], who observed that velocity information is useful for detecting daily living actions in high-resolution videos.…”

Section: Trajectory Descriptorsmentioning

confidence: 99%

See 1 more Smart Citation

High-level event recognition in unconstrained videos

Jiang

Bhattacharya

Chang

et al. 2012

Int J Multimed Info Retr

Self Cite

160

108

View full text Add to dashboard Cite

The goal of high-level event recognition is to automatically detect complex high-level events in a given video sequence. This is a difficult task especially when videos are captured under unconstrained conditions by nonprofessionals. Such videos depicting complex events have limited quality control, and therefore, may include severe camera motion, poor lighting, heavy background clutter, and occlusion. However, due to the fast growing popularity of such videos, especially on the Web, solutions to this problem are in high demands and have attracted great interest from researchers. In this paper, we review current technologies for complex event recognition in unconstrained videos. While the existing solutions vary, we identify common key modules and provide detailed descriptions along with some insights for each of them, including extraction and representation of low-level features across different modalities, classification strategies, fusion techniques, etc. Publicly available benchmark datasets, performance metrics, and related research forums are also described. Finally, we discuss promising directions for future research.

show abstract

Section: Trajectory Descriptorsmentioning

confidence: 99%

Section: Trajectory Descriptorsmentioning

confidence: 99%

High-level event recognition in unconstrained videos

Jiang

Bhattacharya

Chang

et al. 2012

Int J Multimed Info Retr

Self Cite

160

108

View full text Add to dashboard Cite

show abstract

“…For example, Wang et al [24] proposed to incorporate a number of motion primitives of each visual word into the BovW representation of videos. Nevertheless, the above approaches usually directly include the spatial-temporal information into visual content representation, and the storage and computational cost is often high.…”

Section: Related Workmentioning

confidence: 99%

“…near identical regions being assigned to di↵erent visual words, soft-quantization of visual words [9] has been proposed to map each descriptor onto multiple neighboring visual words (in the descriptor feature space). Despite its simple structure, the BovW model has shown a promising performance in the fields such as object/event recognition [24] and image/video retrieval [20].…”

Section: Introductionmentioning

confidence: 99%

Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval

Wang

Song

Elyan

2012

Proceedings of the 21st ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

Most of the state-of-art approaches to Query-by-Example (QBE) video retrieval are based on the Bag-of-visual-Words (BovW) representation of visual content. It, however, ignores the spatial-temporal information, which is important for similarity measurement between videos. Direct incorporation of such information into the video data representation for a large scale data set is computationally expensive in terms of storage and similarity measurement. It is also static regardless of the change of discriminative power of visual words with respect to di↵erent queries. To tackle these limitations, in this paper, we propose to discover SpatialTemporal Correlations (STC) imposed by the query example to improve the BovW model for video retrieval. The STC, in terms of spatial proximity and relative motion coherence between di↵erent visual words, is crucial to identify the discriminative power of the visual words. We develop a novel technique to emphasize the most discriminative visual words for similarity measurement, and incorporate this STC-based approach into the standard inverted index architecture. Our approach is evaluated on the TRECVID2002 and CC WEB VIDEO datasets for two typical QBE video retrieval tasks respectively. The experimental results demonstrate that it substantially improves the BovW model as well as a state of the art method that also utilizes spatialtemporal information for QBE video retrieval.

show abstract

“…Inspired by the method proposed by Wang et al [5] which classifies the relative motion of visual words to represent the temporal patterns in a video, we propose to utilize relative motion to model the temporal relation of visual words.…”

Section: Introductionmentioning

confidence: 99%

Words-of-interest selection based on temporal motion coherence for video retrieval

Wang

Song

Elyan

2011

Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

The "Bag of Visual Words" (BoW) framework has been widely used in query-by-example video retrieval to model the visual content by a set of quantized local feature descriptors. In this paper, we propose a novel technique to enhance BoW by the selection of Word-of-Interest (WoI) that utilizes the quantified temporal motion coherence of the visual words between the adjacent frames in the query example. Experiments carried out using TRECVID datasets show that our technique improves the retrieval performance of the classical BoW-based approach.

show abstract

Video event detection using motion relativity and visual relatedness

Cited by 80 publications

References 31 publications

High-level event recognition in unconstrained videos

High-level event recognition in unconstrained videos

Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval

Words-of-interest selection based on temporal motion coherence for video retrieval

Contact Info

Product

Resources

About