Abstract:In this paper, we propose a novel space-time geometric representation of human landmark configurations and derive tools for comparison and classification. We model the temporal evolution of landmarks as parametrized trajectories on the Riemannian manifold of positive semidefinite matrices of fixed-rank. Our representation has the benefit to bring naturally a second desirable quantity when comparing shapes -the spatial covariance -in addition to the conventional affine-shape representation. We derived then geom… Show more
“…More recently, Kacem et al [18] proposed a geometric approach for modeling and classifying dynamic 2D and 3D landmark sequences based on Gramian matrices derived from the static landmarks. This results in an affineinvariant representation of the data.…”
Section: Related Workmentioning
confidence: 99%
“…The sequences represented in this manifold can be of different length as the execution rate of the actions can vary from one person to another, meaning that we can not effectively compare them. A common method to do so is to use Dynamic Time Warping (DTW) as proposed in several works [3,18,15]. However, DTW does not define a proper metric and can not be used to derive a valid positive-definite kernel for the classification phase.…”
Section: Global Alignmentmentioning
confidence: 99%
“…Overall, our approach achieves competitive results with respect to most recent approaches. We directly compare our results with [18] as we work on the same geometric space of S + (d, n) manifold. The main differences between our method and the method in [18] is the use of a different metric and of the Global Alignment Kernel instead of DTW.…”
Section: Utkinect-action3d Datasetmentioning
confidence: 99%
“…We directly compare our results with [18] as we work on the same geometric space of S + (d, n) manifold. The main differences between our method and the method in [18] is the use of a different metric and of the Global Alignment Kernel instead of DTW. Our metric is simpler that the metric in [18], as we do not have to estimate the parameter k used in Eq.…”
Section: Utkinect-action3d Datasetmentioning
confidence: 99%
“…This paper lies in the continuity of recent works that model the comparison and classification of temporal sequences of landmarks on the Riemannian manifold of positive-semidefinite matrices. Building on the work [18], our approach involves four different steps: 1) We build a trajectory on the Riemannian manifold from the body skeletons; 2) We apply a curve fitting algorithm on the trajectories to denoise the data points; 3) We perform a temporal alignment using a Global Alignment Kernel, defining a positive-semidefinite kernel; 4) Finally, we use this kernel with a classic SVM to classify the actions. An overview of the full approach is given in Fig.…”
In this paper, we tackle the problem of action recognition using body skeletons extracted from video sequences. Our approach lies in the continuity of recent works representing video frames by Gramian matrices that describe a trajectory on the Riemannian manifold of positive-semidefinite matrices of fixed rank. Compared to previous work, the manifold of fixed-rank positive-semidefinite matrices is endowed with a different metric, and we resort to different algorithms for the curve fitting and temporal alignment steps. We evaluated our approach on three publicly available datasets (UTKinect-Action3D, KTH-Action and UAV-Gesture). The results of the proposed approach are competitive with respect to state-of-the-art methods, while only involving body skeletons.
“…More recently, Kacem et al [18] proposed a geometric approach for modeling and classifying dynamic 2D and 3D landmark sequences based on Gramian matrices derived from the static landmarks. This results in an affineinvariant representation of the data.…”
Section: Related Workmentioning
confidence: 99%
“…The sequences represented in this manifold can be of different length as the execution rate of the actions can vary from one person to another, meaning that we can not effectively compare them. A common method to do so is to use Dynamic Time Warping (DTW) as proposed in several works [3,18,15]. However, DTW does not define a proper metric and can not be used to derive a valid positive-definite kernel for the classification phase.…”
Section: Global Alignmentmentioning
confidence: 99%
“…Overall, our approach achieves competitive results with respect to most recent approaches. We directly compare our results with [18] as we work on the same geometric space of S + (d, n) manifold. The main differences between our method and the method in [18] is the use of a different metric and of the Global Alignment Kernel instead of DTW.…”
Section: Utkinect-action3d Datasetmentioning
confidence: 99%
“…We directly compare our results with [18] as we work on the same geometric space of S + (d, n) manifold. The main differences between our method and the method in [18] is the use of a different metric and of the Global Alignment Kernel instead of DTW. Our metric is simpler that the metric in [18], as we do not have to estimate the parameter k used in Eq.…”
Section: Utkinect-action3d Datasetmentioning
confidence: 99%
“…This paper lies in the continuity of recent works that model the comparison and classification of temporal sequences of landmarks on the Riemannian manifold of positive-semidefinite matrices. Building on the work [18], our approach involves four different steps: 1) We build a trajectory on the Riemannian manifold from the body skeletons; 2) We apply a curve fitting algorithm on the trajectories to denoise the data points; 3) We perform a temporal alignment using a Global Alignment Kernel, defining a positive-semidefinite kernel; 4) Finally, we use this kernel with a classic SVM to classify the actions. An overview of the full approach is given in Fig.…”
In this paper, we tackle the problem of action recognition using body skeletons extracted from video sequences. Our approach lies in the continuity of recent works representing video frames by Gramian matrices that describe a trajectory on the Riemannian manifold of positive-semidefinite matrices of fixed rank. Compared to previous work, the manifold of fixed-rank positive-semidefinite matrices is endowed with a different metric, and we resort to different algorithms for the curve fitting and temporal alignment steps. We evaluated our approach on three publicly available datasets (UTKinect-Action3D, KTH-Action and UAV-Gesture). The results of the proposed approach are competitive with respect to state-of-the-art methods, while only involving body skeletons.
Video surveillance has shown encouraging outcomes to monitor human activities and prevent crimes in real time. To this extent, violence detection (VD) has received substantial attention from the research community due to its vast applications, such as ensuring security over public areas and industrial settings through smart machine intelligence. However, because of changing illumination, complex background and low resolution, the analysis of violence patterns remains challenging in the industrial video surveillance domain. In this paper, we propose a computationally intelligent VD approach to precisely detect violent scenes through deep analysis of surveillance video sequential patterns. First, the video stream acquired through the vision sensor is processed by a lightweight convolutional neural network (CNN) for the segmentation of important shots. Next, temporal optical flow features are extracted from the informative shots via a residential optical flow CNN. These are concatenated with appearance‐invariant features extracted from a Darknet CNN model. Finally, a multilayer long short‐term memory network is plugged to generate the final feature map for learning the violence patterns in a sequence of frames. In addition, we contribute to the existing surveillance VD data set by considering its indoor and outdoor scenarios separately for the proposed method's evaluation, achieving a 2% increase in accuracy over surveillance fight data set. Experiments also show encouraging results over the state of the art on other challenging benchmark data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.