Michalis Raptis scite author profile

We describe a mid-level approach for action recognition. From an input video, we extract salient spatio-temporal structures by forming clusters of trajectories that serve as candidates for the parts of an action. The assembly of these clusters into an action class is governed by a graphical model that incorporates appearance and motion constraints for the individual parts and pairwise constraints for the spatio-temporal dependencies among them. During training, we estimate the model parameters discriminatively. During classification, we efficiently match the model to a video using discrete optimization. We validate the model's classification ability in standard benchmark datasets and illustrate its potential to support a fine-grained analysis that not only gives a label to a video, but also identifies and localizes its constituent parts.

show abstract

Real-time classification of dance gestures from skeleton animation

Raptis

2011

View full text Add to dashboard Cite

Sparse Occlusion Detection with Optical Flow

2011

View full text Add to dashboard Cite

We tackle the problem of detecting occluded regions in a video stream. Under assumptions of Lambertian reflection and static illumination, the task can be posed as a variational optimization problem, and its solution approximated using convex minimization. We describe efficient numerical schemes that reach the global optimum of the relaxed cost functional, for any number of independently moving objects, and any number of occlusion layers. We test the proposed algorithm on benchmark datasets, expanded to enable evaluation of occlusion detection performance, in addition to optical flow.

show abstract

Poselet Key-Framing: A Model for Human Activity Recognition

2013

View full text Add to dashboard Cite

Tracklet Descriptors for Action Modeling and Video Analysis

Raptis

Soatto

2010

115

View full text Add to dashboard Cite

Abstract. We present spatio-temporal feature descriptors that can be inferred from video and used as building blocks in action recognition systems. They capture the evolution of "elementary action elements" under a set of assumptions on the image-formation model and are designed to be insensitive to nuisance variability (absolute position, contrast), while retaining discriminative statistics due to the fine-scale motion and the local shape in compact regions of the image. Despite their simplicity, these descriptors, used in conjunction with basic classifiers, attain state of the art performance in the recognition of actions in benchmark datasets.

show abstract

No Bias Left behind: Covariate Shift Adaptation for Discriminative 3D Pose Estimation

Yamada

Sigal

Raptis

2012

View full text Add to dashboard Cite

Abstract. Discriminative, or (structured) prediction, methods have proved effective for variety of problems in computer vision; a notable example is 3D monocular pose estimation. All methods to date, however, relied on an assumption that training (source) and test (target) data come from the same underlying joint distribution. In many real cases, including standard datasets, this assumption is flawed. In presence of training set bias, the learning results in a biased model whose performance degrades on the (target) test set. Under the assumption of covariate shift we propose an unsupervised domain adaptation approach to address this problem. The approach takes the form of training instance re-weighting, where the weights are assigned based on the ratio of training and test marginals evaluated at the samples. Learning with the resulting weighted training samples, alleviates the bias in the learned models. We show the efficacy of our approach by proposing weighted variants of Kernel Regression (KR) and Twin Gaussian Processes (TGP). We show that our weighted variants outperform their un-weighted counterparts and improve on the state-of-the-art performance in the public (HumanEva) dataset.

show abstract

Cross-Domain Matching with Squared-Loss Mutual Information

Yamada

Sigal

Raptis

et al. 2015

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

The goal of cross-domain matching (CDM) is to find correspondences between two sets of objects in different domains in an unsupervised way. CDM has various interesting applications, including photo album summarization where photos are automatically aligned into a designed frame expressed in the Cartesian coordinate system, and temporal alignment which aligns sequences such as videos that are potentially expressed using different features. In this paper, we propose an information-theoretic CDM framework based on squared-loss mutual information (SMI). The proposed approach can directly handle non-linearly related objects/sequences with different dimensions, with the ability that hyper-parameters can be objectively optimized by cross-validation. We apply the proposed method to several real-world problems including image matching, unpaired voice conversion, photo album summarization, cross-feature video and cross-domain video-to-mocap alignment, and Kinect-based action recognition, and experimentally demonstrate that the proposed method is a promising alternative to state-of-the-art CDM methods.

show abstract

Spike train driven dynamical models for human actions

Raptis

Wnuk

Soatto

2010

View full text Add to dashboard Cite

We investigate dynamical models of human motion that can support both synthesis and analysis tasks. Unlike coarser discriminative models that work well when action classes are nicely separated, we seek models that have finescale representational power and can therefore model subtle differences in the way an action is performed. To this end, we model an observed action as an (unknown) linear time-invariant dynamical model of relatively small order, driven by a sparse bounded input signal.Our motivating intuition is that the time-invariant dynamics will capture the unchanging physical characteristics of an actor, while the inputs used to excite the system will correspond to a causal signature of the action being performed. We show that our model has sufficient representational power to closely approximate large classes of non-stationary actions with significantly reduced complexity. We also show that temporal statistics of the inferred input sequences can be compared in order to recognize actions and detect transitions between them.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Michalis Raptis

Discovering discriminative action parts from mid-level video representations

Real-time classification of dance gestures from skeleton animation

Sparse Occlusion Detection with Optical Flow

Poselet Key-Framing: A Model for Human Activity Recognition

Tracklet Descriptors for Action Modeling and Video Analysis

No Bias Left behind: Covariate Shift Adaptation for Discriminative 3D Pose Estimation

Cross-Domain Matching with Squared-Loss Mutual Information

Spike train driven dynamical models for human actions

Contact Info

Product

Resources

About