Exploiting multi-level parallelism for low-latency activity recognition in streaming video

Chen, Mingyu; Mummert, Lily B.; Pillai, Padmanabhan; Hauptmann, Alexander G.; Sukthankar, Rahul

doi:10.1145/1730836.1730838

Cited by 20 publications

(18 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although many have used HoG/HoF descriptors [18,22,17,4], they aggregate them into a static signature, whereas our previous analysis and [36] suggest retaining their temporal evolution. However, rather than averaging by spatial binning (that presumes ergodicity), we prefer to use at least a crude approximation of the prior dP (g, w) in the form of samples {g(t j )}, {w(x, t j )} inferred during the training phase.…”

Section: Simplest Instantiation and Inference Of The Representationmentioning

confidence: 72%

“…While one would want to assemble these elementary actions (dictionary elements) into a model that captures the joint spatio-temporal statistics at a more global spatial scale ("context"), in Sect. 4 we show that even a naive use of the dictionary labels as a "spatial bag" yields competitive performance in end-to-end tasks.…”

Section: Spatio-temporal Tracklet Descriptorsmentioning

confidence: 95%

“…Different local descriptors have been proposed to capture shape [34,7] or joint motion and shape [18,17,4] by aggregating features within video cubes centered at spatio-temporal interest points into a static descriptor. In contrast, we retain in our tracklet descriptor the entire feature time series from birth to death of each tracked region.…”

Section: Related Workmentioning

confidence: 99%

“…Although "oG" in AoG stands for the gradient orientation, in analogy to HoG, any other contrast-normalizing statistic φ can be used, as in (4). Similarly, we have…”

Section: Simplest Instantiation and Inference Of The Representationmentioning

confidence: 99%

See 3 more Smart Citations

Tracklet Descriptors for Action Modeling and Video Analysis

Raptis

Soatto

2010

Computer Vision – ECCV 2010

115

View full text Add to dashboard Cite

Abstract. We present spatio-temporal feature descriptors that can be inferred from video and used as building blocks in action recognition systems. They capture the evolution of "elementary action elements" under a set of assumptions on the image-formation model and are designed to be insensitive to nuisance variability (absolute position, contrast), while retaining discriminative statistics due to the fine-scale motion and the local shape in compact regions of the image. Despite their simplicity, these descriptors, used in conjunction with basic classifiers, attain state of the art performance in the recognition of actions in benchmark datasets.

show abstract

Section: Simplest Instantiation and Inference Of The Representationmentioning

confidence: 72%

Section: Spatio-temporal Tracklet Descriptorsmentioning

confidence: 95%

Section: Related Workmentioning

confidence: 99%

“…Although "oG" in AoG stands for the gradient orientation, in analogy to HoG, any other contrast-normalizing statistic φ can be used, as in (4). Similarly, we have…”

Section: Simplest Instantiation and Inference Of The Representationmentioning

confidence: 99%

See 2 more Smart Citations

Tracklet Descriptors for Action Modeling and Video Analysis

Raptis

Soatto

2010

Computer Vision – ECCV 2010

115

View full text Add to dashboard Cite

show abstract

“…For example, MoSIFT is a technique to better recognize activities in surveillance video by exploiting continuous object motion explicitly calculated from optical flow, integrated with distinctive appearance features [Chen 2010]. The value of these techniques for video retrieval are assessed through continued participation in international benchmarking forums, such as NIST TRECVID which chart progress on tasks related to video analytics.…”

Section: Combining Multimedia Analysis and Visual Analytics: The Begimentioning

confidence: 99%