Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

Niebles, Juan Carlos; Wang, Hongcheng; Li, Feifei

doi:10.1007/s11263-007-0122-4

Cited by 1,371 publications

(821 citation statements)

References 30 publications

Supporting

Mentioning

791

Contrasting

Unclassified

Order By: Relevance

“…A hierarchical notion of behavior, that enriches the serial action-based definition by adding different abstraction levels, is proposed in [18,49]. In [17], a completely novel concept of behavior is proposed, which consists of an ensemble of spatiotemporal feature points, deeply investigated in [50]. However, these structured definitions are based on the design of visual bricks (visual words) that do not carry any intuitive meanings.…”

Section: Surveillance and Monitoringmentioning

confidence: 99%

Socially intelligent surveillance and monitoring: Analysing social dimensions of physical space

Cristani

Murino

Vinciarelli

2010

2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops

View full text Add to dashboard Cite

In general terms, surveillance and monitoring technologies aim at understanding what people do in a given environment, whether this means to ensure the safety of workers on the factory floor, to detect crimes occurring in indoor or outdoor settings, or to monitor the flow of large crowds through public spaces. However, surveillance and monitoring technologies rarely consider that they analyze human behavior, a phenomenon subject to principles and laws rigorous enough to produce stable and predictable patterns corresponding to social, affective, and psychological phenomena. On the other hand, these phenomena are the subject of other computing domains, in particular Social Signal Processing and Affective Computing, that typically neglect scenarios relevant to surveillance and monitoring technologies, especially when it comes to social and affective dimensions of space in human activities. The goal of this paper is to show that the investigation of the overlapping area between surveillance and monitoring on one side, and Social Signal Processing and Affective Computing on the other side can bring significant progress in both domains and open a number of interesting research perspectives.

show abstract

Section: Surveillance and Monitoringmentioning

confidence: 99%

Socially intelligent surveillance and monitoring: Analysing social dimensions of physical space

Cristani

Murino

Vinciarelli

2010

2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops

View full text Add to dashboard Cite

show abstract

“…As noticed by [4] and observed from our experiments, the interest points detected by generalized space-time interest points detector from [16] are too sparse to build model for many complex activities. Therefore, we utilized the one from Dollar [4], which has been proven successful in [4,12,20]. Here we give a brief review of this method.…”

Section: Feature Extractionmentioning

confidence: 99%

Spatiotemporal pyramid representation for recognition of facial expressions and hand gestures

Zhao

Elgammal

2008

2008 8th IEEE International Conference on Automatic Face &Amp; Gesture Recognition

View full text Add to dashboard Cite

show abstract

“…HOG3D [2]) from video appearance, and then apply a standard clustering algorithm. For instance, Wang et al [3] cluster images strictly based on appearance, and Niebles et al [4] develop topic models based on video bag-of-words approaches. However, these methods are generally limited in performance due to the lack of semantics in low-level visual appearance.…”

Section: Introductionmentioning

confidence: 99%

Discovering Video Clusters from Visual Features and Noisy Tags

Vahdat

Zhou

Mori

2014

Computer Vision – ECCV 2014

View full text Add to dashboard Cite

Abstract. We present an algorithm for automatically clustering tagged videos. Collections of tagged videos are commonplace, however, it is not trivial to discover video clusters therein. Direct methods that operate on visual features ignore the regularly available, valuable source of tag information. Solely clustering videos on these tags is error-prone since the tags are typically noisy. To address these problems, we develop a structured model that considers the interaction between visual features, video tags and video clusters. We model tags from visual features, and correct noisy tags by checking visual appearance consistency. In the end, videos are clustered from the refined tags as well as the visual features. We learn the clustering through a max-margin framework, and demonstrate empirically that this algorithm can produce more accurate clustering results than baseline methods based on tags or visual features, or both. Further, qualitative results verify that the clustering results can discover sub-categories and more specific instances of a given video category.

show abstract

Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

Cited by 1,371 publications

References 30 publications

Socially intelligent surveillance and monitoring: Analysing social dimensions of physical space

Socially intelligent surveillance and monitoring: Analysing social dimensions of physical space

Spatiotemporal pyramid representation for recognition of facial expressions and hand gestures

Discovering Video Clusters from Visual Features and Noisy Tags

Contact Info

Product

Resources

About