2020
DOI: 10.1109/tcsvt.2019.2890829
|View full text |Cite
|
Sign up to set email alerts
|

Jointly Learning Visual Poses and Pose Lexicon for Semantic Action Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
21
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 13 publications
(21 citation statements)
references
References 33 publications
0
21
0
Order By: Relevance
“…Different from the papers in Ref. [3,5,45], the visual frames of each body part are dependent only on the visual poses of the same body part since the visual frames are generated from visual poses. Therefore, P (X|S,T ) is formulated as…”
Section: Proposed Methodsmentioning
confidence: 96%
See 3 more Smart Citations
“…Different from the papers in Ref. [3,5,45], the visual frames of each body part are dependent only on the visual poses of the same body part since the visual frames are generated from visual poses. Therefore, P (X|S,T ) is formulated as…”
Section: Proposed Methodsmentioning
confidence: 96%
“…It can be seen that visual frames of each body part are generated from the visual poses of the same body part through the visual pose model. Hence, multiple visual pose models are required to be simultaneously learnt, and it is unlike the previous methods [3,5,45] in which one visual pose model is learnt.…”
Section: Proposed Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Skeletons can also be extracted from either depth maps [1] or RGB video [2] under certain conditions, for instance, the subjects being in a standing position and not being overly occluded. As the seminal work [3], research on action recognition [4] from RGB-D data has extensively focused on using either skeletons [5,6] or depth maps [7], some work using multiple modalities including RGB video. However, single modality alone often fails to recognize some actions, such as human-object interactions, that require both 3D geometric and appearance information to characterize the body movement and the objects being interacted.…”
Section: Introductionmentioning
confidence: 99%