Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

Niebles, Juan Carlos; Wang, Hongcheng; Li, Feifei

doi:10.5244/c.20.127

Cited by 413 publications

(347 citation statements)

References 1 publication

Supporting

Mentioning

341

Contrasting

Order By: Relevance

“…For example, in Laptev et al (2007), the authors propose to use event-based local motion representations (here, spatial-temporal chunks of a video corresponding to 2D + t edges) and template matching. This idea extracting spatial-temporal features was proposed in several contributions such as Dollar et al (2005), and then Niebles et al (2006), Wong et al (2007), using the notion of cuboids. Another stream of approaches was inspired by the work by Serre (2006), first applied to object recognition Mutch and Lowe 2006) and then extended to action recognition (Sigala et al 2005;Jhuang et al 2007).…”

Section: How Computer Vision Does?mentioning

confidence: 99%

Action Recognition Using a Bio-Inspired Feedforward Spiking Network

et al. 2009

View full text Add to dashboard Cite

We propose a bio-inspired feedforward spiking network modeling two brain areas dedicated to motion (V1 and MT), and we show how the spiking output can be exploited in a computer vision application: action recognition. In order to analyze spike trains, we consider two characteristics of the neural code: mean firing rate of each neuron and synchrony between neurons. Interestingly, we show that they carry some relevant information for the action recognition application. We compare our results to Jhuang et al. (Proceedings of the 11th international conference on computer vision, pp. 1-8, 2007) on the Weizmann database. As a conclusion, we are convinced that spiking networks represent a powerful alternative framework for real vision applications that will benefit from recent advances in computational neuroscience.

show abstract

Section: How Computer Vision Does?mentioning

confidence: 99%

Action Recognition Using a Bio-Inspired Feedforward Spiking Network

et al. 2009

View full text Add to dashboard Cite

show abstract

“…[Zhang and Parker (2011) use an unsupervised learning algorithm with Kinect data, but create their own segmented, scripted, laboratory data set.] On the other hand, some studies have used unsupervised or semisupervised algorithms and real-world data with other types of sensors (Krause et al 2003;Wang et al 2009;Stikic et al 2008;Niebles et al 2008;Mahdaviani and Choudhury 2008). Table 1 lists the attributes of several studies, including those cited in the previous paragraph, with a column of indicators for each attribute.…”

Section: Background Researchmentioning

confidence: 97%

“…While we classify this as real-world data, it was not collected in a real-world setting, due to the intrusiveness of on-body sensors. Niebles et al (2008) used video segments of figure skaters. While the activities were not scripted, the authors did preselect video segments for x" indicates that the research includes the attribute analysis.…”

Section: Background Researchmentioning

confidence: 99%

Task recognition from joint tracking data in an operational manufacturing cell

2015

View full text Add to dashboard Cite

This paper investigates the feasibility of using inexpensive, general-purpose automated methods for recognition of worker activity in manufacturing processes. A novel aspect of this study is that it is based on live data collected from an operational manufacturing cell without any guided or scripted work. Activity in a single-worker cell was recorded using the Microsoft Kinect, a commodity-priced sensor that records depth data and includes built-in functions for the detection of human skeletal positions, including the positions of all major joints. Joint position data for two workers on different shifts was used as input to a collection of learning algorithms with the goal of classifying the activities of each worker at each moment in time. Results show that unsupervised and semisupervised algorithms, such as unsupervised hidden Markov models, show little loss of accuracy compared to supervised methods trained with ground truth data. This conclusion is important because it implies that automated activity recognition can be accomplished without the use of ground truth labels, which can only be obtained by time-consuming manual review of videos. The results of this study suggest that intelligent manufacturing can now include detailed process-control measures of human workers with systems that are affordable enough to be installed permanently for continuous data collection.

show abstract

“…They were used to solve the problems of scene categorization (FeiFei and Perona 2005;Sudderth et al 2007), object recognition (Sivic et al 2005;Sudderth et al 2007;, human action recognition (Niebles et al 2006;Niebles and Fei-Fei 2007) and video analysis (Wang et al 2009). Fox and Willsky et al (2006) used Dirichlet process to solve the problem of data association for multi-target tracking in the presence of an unknown number of targets.…”

Section: Related Workmentioning

confidence: 99%

Trajectory Analysis and Semantic Region Modeling Using Nonparametric Hierarchical Bayesian Models

et al. 2011

View full text Add to dashboard Cite

We propose a novel framework of using a nonparametric Bayesian model, called Dual Hierarchical Dirichlet Processes (Dual-HDP) (Wang et al. in IEEE Trans. Pattern Anal. Mach. Intell. 31:539-555, 2009), for unsupervised trajectory analysis and semantic region modeling in surveillance settings. In our approach, trajectories are treated as documents and observations of an object on a trajectory are treated as words in a document. Trajectories are clustered into different activities. Abnormal trajectories are detected as samples with low likelihoods. The semantic regions, which are subsets of paths commonly taken by objects and are related to activities in the scene, are also modeled. Under Dual-HDP, both the number of activity categories and the number of semantic regions are automatically learnt from data. In this paper, we further extend Dual-HDP to a Dynamic Dual-HDP model which allows dynamic update of activity models and online detection of normal/abnormal activities. Experiments Part of this work was published in Wang et al. (2008).are evaluated on a simulated data set and two real data sets, which include 8, 478 radar tracks collected from a maritime port and 40,453 visual tracks collected from a parking lot.

show abstract

Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

Cited by 413 publications

References 1 publication

Action Recognition Using a Bio-Inspired Feedforward Spiking Network

Action Recognition Using a Bio-Inspired Feedforward Spiking Network

Task recognition from joint tracking data in an operational manufacturing cell

Trajectory Analysis and Semantic Region Modeling Using Nonparametric Hierarchical Bayesian Models

Contact Info

Product

Resources

About