Proceedings of the First Annual ACM SIGMM Conference on Multimedia Systems 2010
DOI: 10.1145/1730836.1730838
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting multi-level parallelism for low-latency activity recognition in streaming video

Abstract: Video understanding is a computationally challenging task that is critical not only for traditionally throughput-oriented applications such as search but also latency-sensitive interactive applications such as surveillance, gaming, videoconferencing, and vision-based user interfaces. Enabling these types of video processing applications will require not only new algorithms and techniques, but new runtime systems that optimize latency as well as throughput. In this paper, we present a runtime system called Spro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
17
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 20 publications
(18 citation statements)
references
References 43 publications
1
17
0
Order By: Relevance
“…Although many have used HoG/HoF descriptors [18,22,17,4], they aggregate them into a static signature, whereas our previous analysis and [36] suggest retaining their temporal evolution. However, rather than averaging by spatial binning (that presumes ergodicity), we prefer to use at least a crude approximation of the prior dP (g, w) in the form of samples {g(t j )}, {w(x, t j )} inferred during the training phase.…”
Section: Simplest Instantiation and Inference Of The Representationmentioning
confidence: 72%
See 3 more Smart Citations
“…Although many have used HoG/HoF descriptors [18,22,17,4], they aggregate them into a static signature, whereas our previous analysis and [36] suggest retaining their temporal evolution. However, rather than averaging by spatial binning (that presumes ergodicity), we prefer to use at least a crude approximation of the prior dP (g, w) in the form of samples {g(t j )}, {w(x, t j )} inferred during the training phase.…”
Section: Simplest Instantiation and Inference Of The Representationmentioning
confidence: 72%
“…While one would want to assemble these elementary actions (dictionary elements) into a model that captures the joint spatio-temporal statistics at a more global spatial scale ("context"), in Sect. 4 we show that even a naive use of the dictionary labels as a "spatial bag" yields competitive performance in end-to-end tasks.…”
Section: Spatio-temporal Tracklet Descriptorsmentioning
confidence: 95%
See 2 more Smart Citations
“…For example, MoSIFT is a technique to better recognize activities in surveillance video by exploiting continuous object motion explicitly calculated from optical flow, integrated with distinctive appearance features [Chen 2010]. The value of these techniques for video retrieval are assessed through continued participation in international benchmarking forums, such as NIST TRECVID which chart progress on tasks related to video analytics.…”
Section: Combining Multimedia Analysis and Visual Analytics: The Begimentioning
confidence: 99%