“…In visionbased action recognition, the common approach is to extract image features from video data and to issue a corresponding action class label (Poppe, 2010;Babiker et al, 2018). Nevertheless, when skeleton representation of the human body is used, the most privileged discriminative features are the raw data coming from the skeletal tracking (joint spatial coordinates) (Patsadu et al, 2012;Youness and Abdelhak, 2016) or some indices expressing geometric relations between certain body points, such as: the vertical distance from hip joint to room floor (Visutarrom et al, 2014(Visutarrom et al, , 2015, the distance between the right toe and the plane spanned by the left ankle, the left hip and the foot for a fixed pose (Müller et al, 2005) the distance between two joints, two body segments, or a joint and a body segment (Yang and Tian, 2014), the relative angle between two segments within the body kinematic chain (Müller et al, 2005) and finally, the size of the 3D bounding box enclosing the body skeleton (Bevilacqua et al, 2014). Geometric features are synthetic in the sense that they express a single geometric aspect making them particularly robust to spatial variations that are not correlated with the aspect of interest (Müller et al, 2005).…”