2008
DOI: 10.1007/s11263-007-0122-4
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

Abstract: We present a novel unsupervised learning method for human action categories. A video sequence is represented as a collection of spatial-temporal words by extracting space-time interest points. The algorithm automatically learns the probability distributions of the spatial-temporal words and the intermediate topics corresponding to human action categories. This is achieved by using latent topic models such as the probabilistic Latent Semantic Analysis (pLSA) model and Latent Dirichlet Allocation (LDA). Our appr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

8
791
3
9

Year Published

2008
2008
2021
2021

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 1,371 publications
(821 citation statements)
references
References 30 publications
8
791
3
9
Order By: Relevance
“…A hierarchical notion of behavior, that enriches the serial action-based definition by adding different abstraction levels, is proposed in [18,49]. In [17], a completely novel concept of behavior is proposed, which consists of an ensemble of spatiotemporal feature points, deeply investigated in [50]. However, these structured definitions are based on the design of visual bricks (visual words) that do not carry any intuitive meanings.…”
Section: Surveillance and Monitoringmentioning
confidence: 99%
“…A hierarchical notion of behavior, that enriches the serial action-based definition by adding different abstraction levels, is proposed in [18,49]. In [17], a completely novel concept of behavior is proposed, which consists of an ensemble of spatiotemporal feature points, deeply investigated in [50]. However, these structured definitions are based on the design of visual bricks (visual words) that do not carry any intuitive meanings.…”
Section: Surveillance and Monitoringmentioning
confidence: 99%
“…As noticed by [4] and observed from our experiments, the interest points detected by generalized space-time interest points detector from [16] are too sparse to build model for many complex activities. Therefore, we utilized the one from Dollar [4], which has been proven successful in [4,12,20]. Here we give a brief review of this method.…”
Section: Feature Extractionmentioning
confidence: 99%
“…HOG3D [2]) from video appearance, and then apply a standard clustering algorithm. For instance, Wang et al [3] cluster images strictly based on appearance, and Niebles et al [4] develop topic models based on video bag-of-words approaches. However, these methods are generally limited in performance due to the lack of semantics in low-level visual appearance.…”
Section: Introductionmentioning
confidence: 99%