Procedings of the British Machine Vision Conference 2006 2006
DOI: 10.5244/c.20.127
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

Abstract: DESCRIPTIONImagine a video taken on a sunny beach, can a computer automatically tell what is happening in the scene? Can it identify different human activities in the video, such as water surfing, people walking and lying on the beach? To automatically classify or localize different actions in video sequences is very useful for a variety of tasks, such as video surveillance, objectlevel video summarization, video indexing, digital library organization, etc. However, it remains a challenging task for computers … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
341
0

Year Published

2009
2009
2015
2015

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 413 publications
(347 citation statements)
references
References 1 publication
4
341
0
Order By: Relevance
“…For example, in Laptev et al (2007), the authors propose to use event-based local motion representations (here, spatial-temporal chunks of a video corresponding to 2D + t edges) and template matching. This idea extracting spatial-temporal features was proposed in several contributions such as Dollar et al (2005), and then Niebles et al (2006), Wong et al (2007), using the notion of cuboids. Another stream of approaches was inspired by the work by Serre (2006), first applied to object recognition Mutch and Lowe 2006) and then extended to action recognition (Sigala et al 2005;Jhuang et al 2007).…”
Section: How Computer Vision Does?mentioning
confidence: 99%
“…For example, in Laptev et al (2007), the authors propose to use event-based local motion representations (here, spatial-temporal chunks of a video corresponding to 2D + t edges) and template matching. This idea extracting spatial-temporal features was proposed in several contributions such as Dollar et al (2005), and then Niebles et al (2006), Wong et al (2007), using the notion of cuboids. Another stream of approaches was inspired by the work by Serre (2006), first applied to object recognition Mutch and Lowe 2006) and then extended to action recognition (Sigala et al 2005;Jhuang et al 2007).…”
Section: How Computer Vision Does?mentioning
confidence: 99%
“…[Zhang and Parker (2011) use an unsupervised learning algorithm with Kinect data, but create their own segmented, scripted, laboratory data set.] On the other hand, some studies have used unsupervised or semisupervised algorithms and real-world data with other types of sensors (Krause et al 2003;Wang et al 2009;Stikic et al 2008;Niebles et al 2008;Mahdaviani and Choudhury 2008). Table 1 lists the attributes of several studies, including those cited in the previous paragraph, with a column of indicators for each attribute.…”
Section: Background Researchmentioning
confidence: 97%
“…While we classify this as real-world data, it was not collected in a real-world setting, due to the intrusiveness of on-body sensors. Niebles et al (2008) used video segments of figure skaters. While the activities were not scripted, the authors did preselect video segments for x" indicates that the research includes the attribute analysis.…”
Section: Background Researchmentioning
confidence: 99%
“…They were used to solve the problems of scene categorization (FeiFei and Perona 2005;Sudderth et al 2007), object recognition (Sivic et al 2005;Sudderth et al 2007;, human action recognition (Niebles et al 2006;Niebles and Fei-Fei 2007) and video analysis (Wang et al 2009). Fox and Willsky et al (2006) used Dirichlet process to solve the problem of data association for multi-target tracking in the presence of an unknown number of targets.…”
Section: Related Workmentioning
confidence: 99%