Abstract. Physical activity is essential for stroke survivors for recovering some autonomy in daily life activities. Post-stroke patients are initially subject to physical therapy under the supervision of a health professional, but due to economical aspects, home based rehabilitation is eventually suggested. In order to support the physical activity of stroke patients at home, this paper presents a system for guiding the user in how to properly perform certain actions and movements. This is achieved by presenting feedback in form of visual information and human-interpretable messages. The core of the proposed approach is the analysis of the motion required for aligning body-parts with respect to a template skeleton pose, and how this information can be presented to the user in form of simple recommendations. Experimental results in three datasets show the potential of the proposed framework.
Humans use facial expressions successfully for conveying their emotional states. However, replicating such success in the human-computer interaction domain is an active research problem. In this paper, we propose deep convolutional neural network (DCNN) for joint learning of robust facial expression features from fused RGB and depth map latent representations. We posit that learning jointly from both modalities result in a more robust classifier for facial expression recognition (FER) as opposed to learning from either of the modalities independently. Particularly, we construct a learning pipeline that allows us to learn several hierarchical levels of feature representations and then perform the fusion of RGB and depth map latent representations for joint learning of facial expressions. Our experimental results on the BU-3DFE dataset validate the proposed fusion approach, as a model learned from the joint modalities outperforms models learned from either of the modalities.
Abstract-In this paper, we introduce a deformation based representation space for curved shapes in R n . Given an ordered set of points sampled from a curved shape, the proposed method represents the set as an element of a finite dimensional matrix Lie group. Variation due to scale and location are filtered in a preprocessing stage, while shapes that vary only in rotation are identified by an equivalence relationship. The use of a finite dimensional matrix Lie group leads to a similarity metric with an explicit geodesic solution. Subsequently, we discuss some of the properties of the metric and its relationship with a deformation by least action. Furthermore, invariance to reparametrization or estimation of point correspondence between shapes is formulated as an estimation of sampling function. Thereafter, two possible approaches are presented to solve the point correspondence estimation problem. Finally, we propose an adaptation of k-means clustering for shape analysis in the proposed representation space. Experimental results show that the proposed representation is robust to uninformative cues, e.g. local shape perturbation and displacement. In comparison to state of the art methods, it achieves a high precision on the Swedish and the Flavia leaf datasets and a comparable result on MPEG-7, Kimia99 and Kimia216 datasets.
Some of the main challenges in skeleton-based action recognition systems are redundant and noisy pose transformations. Earlier works in skeleton-based action recognition explored different approaches for filtering linear noise transformations, but neglect to address potential nonlinear transformations. In this paper, we present an unsupervised learning approach for estimating nonlinear noise transformations in pose estimates. Our approach starts by decoupling linear and nonlinear noise transformations. While the linear transformations are modelled explicitly the nonlinear transformations are learned from data. Subsequently, we use an autoencoder with L 2-norm reconstruction error and show that it indeed does capture nonlinear noise transformations, and recover a denoised pose estimate which in turn improves performance significantly. We validate our approach on a publicly available dataset, NW-UCLA.
In recent years 3D models of buildings are used in maintenance and inspection, preservation, and other building related applications. However, the usage of these models is limited because most models are pure representations with no or little associated semantics. In this paper we present a pipeline of techniques used for interior interpretation, object detection, and adding energy related semantics to windows of a 3D thermal model. A sequence of algorithms is presented for building the fundamental semantics of a 3D model. Among other things, these algorithms enable the system to differentiate between objects in a room and objects that are part of the room, e.g. floor, windows. Subsequently, the thermal information is used to construct a stochastic mathematical modelnamely Markov Random Field-of the temperature distribution of the detected windows. As a result, the MAP(Maximum a posteriori) framework is used to further label the windows as either open, closed or damaged based upon their temperature distribution.
In this paper, we introduce a similarity metric for curved shapes that can be described, distinctively, by ordered points. The proposed method represents a given curve as a point in the deformation space, the direct product of rigid transformation matrices, such that the successive action of the matrices on a fixed starting point reconstructs the full curve. In general, both open and closed curves are represented in the deformation space modulo shape orientation and orientation preserving diffeomorphisms. The use of direct product Lie groups to represent curved shapes led to an explicit formula for geodesic curves and the formulation of a similarity metric between shapes by the L 2 -norm on the Lie algebra. Additionally, invariance to reparametrization or estimation of point correspondence between shapes is performed as an intermediate step for computing geodesics. Furthermore, since there is no computation of differential quantities on the curves, our representation is more robust to local perturbations and needs no pre-smoothing. We compare our method with the elastic shape metric defined through the square root velocity (SRV) mapping, and other shape matching approaches.
The Dense Trajectories concept is one of the most successful approaches in action recognition, suitable for scenarios involving a significant amount of motion. However, due to noise and background motion, many generated trajectories are irrelevant to the actual human activity and can potentially lead to performance degradation. In this paper, we propose Localized Trajectories as an improved version of Dense Trajectories where motion trajectories are clustered around human body joints provided by RGB-D cameras and then encoded by local Bag-of-Words. As a result, the Localized Trajectories concept provides an advanced discriminative representation of actions. Moreover, we generalize Localized Trajectories to 3D by using the depth modality. One of the main advantages of 3D Localized Trajectories is that they describe radial displacements that are perpendicular to the image plane. Extensive experiments and analysis were carried out on five different datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.