Jointly Learning Visual Poses and Pose Lexicon for Semantic Action Recognition

Zhou, Lijuan; Li, Wanqing; Ogunbona, Philip; Zhang, Zhengyou

doi:10.1109/tcsvt.2019.2890829

Cited by 13 publications

(21 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Different from the papers in Ref. [3,5,45], the visual frames of each body part are dependent only on the visual poses of the same body part since the visual frames are generated from visual poses. Therefore, P (X|S,T ) is formulated as…”

Section: Proposed Methodsmentioning

confidence: 96%

“…It can be seen that visual frames of each body part are generated from the visual poses of the same body part through the visual pose model. Hence, multiple visual pose models are required to be simultaneously learnt, and it is unlike the previous methods [3,5,45] in which one visual pose model is learnt.…”

Section: Proposed Methodsmentioning

confidence: 99%

“…[3,45], the paper in Ref. [5] assumes that visual poses are hidden, and a two-level HMM is proposed to jointly learn visual poses and the probabistic mapping. In all of the three papers, the key problem is to learn one pose lexicon, which is a whole body-based pose lexicon.…”

Section: Semantic Action Recognitionmentioning

confidence: 99%

“…Third, this paper proposes to learn multiple pose lexicons jointly and one pose lexicon corresponds to the actions of one body part. The temporal relationship of semantic pose of body parts includes not only sequential but also simultaneous while only sequential relationship is included in the whole body-based pose lexicon [3,5]. Moreover, a transition model is proposed for capturing simultaneous actions of different body parts, which is not required in the paper [3,5].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Learning body part‐based pose lexicons for semantic action recognition

Zhou

Jiang

2022

IET Computer Vision

Self Cite

View full text Add to dashboard Cite

Semantic action recognition aims to classify actions based on the associated semantics, which can be applied in video captioning and human‐machine interaction. In this paper the problem is addressed by jointly learning multiple pose lexicons based on multiple body parts. Specifically, multiple visual pose models are learnt, and one visual pose model is associated with one body part, which characterises the likelihood of an observed video frame being generated from hidden visual poses. Moreover, multiple pose lexicon models are simultaneously learnt along with visual pose models. One pose lexicon model is associated with one body part that establishes a probabilistic mapping between the hidden visual poses and semantic poses parsed from textual instructions. To capture the temporal relations among body parts, a transition model is also learnt to measure the probability of the alignment transitioned from one position to another position. The body part‐based pose lexicon learning provides a novel method of cross‐modality semantic correlation, which can be applied in other spatial and temporal data. Action classification is finally formulated as the problem of finding the maximum posterior probability that a given multiple sequences of visual frames follow multiple sequences of semantic poses, subject to the most likely visual pose sequences and alignment sequences. Experiments were conducted on five action datasets to validate the effectiveness of the proposed method.

show abstract

Section: Proposed Methodsmentioning

confidence: 96%

Section: Proposed Methodsmentioning

confidence: 99%

Section: Semantic Action Recognitionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning body part‐based pose lexicons for semantic action recognition

Zhou

Jiang

2022

IET Computer Vision

Self Cite

View full text Add to dashboard Cite

show abstract

“…Skeletons can also be extracted from either depth maps [1] or RGB video [2] under certain conditions, for instance, the subjects being in a standing position and not being overly occluded. As the seminal work [3], research on action recognition [4] from RGB-D data has extensively focused on using either skeletons [5,6] or depth maps [7], some work using multiple modalities including RGB video. However, single modality alone often fails to recognize some actions, such as human-object interactions, that require both 3D geometric and appearance information to characterize the body movement and the objects being interacted.…”

Section: Introductionmentioning

confidence: 99%

A Hybrid Network for Large-Scale Action Recognition from RGB and Depth Modalities

Wang

Song

et al. 2020

Sensors

Self Cite

View full text Add to dashboard Cite

The paper presents a novel hybrid network for large-scale action recognition from multiple modalities. The network is built upon the proposed weighted dynamic images. It effectively leverages the strengths of the emerging Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) based approaches to specifically address the challenges that occur in large-scale action recognition and are not fully dealt with by the state-of-the-art methods. Specifically, the proposed hybrid network consists of a CNN based component and an RNN based component. Features extracted by the two components are fused through canonical correlation analysis and then fed to a linear Support Vector Machine (SVM) for classification. The proposed network achieved state-of-the-art results on the ChaLearn LAP IsoGD, NTU RGB+D and Multi-modal & Multi-view & Interactive ( M 2 I ) datasets and outperformed existing methods by a large margin (over 10 percentage points in some cases).

show abstract

Learning Using Privileged Information for Zero-Shot Action Recognition

Gao

Hou

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Jointly Learning Visual Poses and Pose Lexicon for Semantic Action Recognition

Cited by 13 publications

References 33 publications

Learning body part‐based pose lexicons for semantic action recognition

Learning body part‐based pose lexicons for semantic action recognition

A Hybrid Network for Large-Scale Action Recognition from RGB and Depth Modalities

Learning Using Privileged Information for Zero-Shot Action Recognition

Contact Info

Product

Resources

About