Action Recognition with 3D ConvNet-GRU Architecture

Yao, Guangle; Liu, Xianyuan; Leí, Tao

doi:10.1145/3265639.3265672

Cited by 14 publications

(9 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [13], a 2D CNN-LSTM architecture, where, in parallel with the LSTMs, it also uses a weakly supervised gloss-detection regularization network, consisting of stacked temporal 1D convolutions. A simpler variant of LSTMs, Gated-recurrent Units (GRU) [11], which consist of only two gates (update and reset gates), and have the internal state (output state) fully exposed, have also been used for temporal modelling [54].…”

Section: Image Appearance Based Methodsmentioning

confidence: 99%

Pose-based Sign Language Recognition using GCN and BERT

Tunga¹,

Vidyaranya²,

Juan³

2020

Preprint

View full text Add to dashboard Cite

Sign language recognition (SLR) plays a crucial role in bridging the communication gap between the hearing and vocally impaired community and the rest of the society. Word-level sign language recognition (WSLR) is the first important step towards understanding and interpreting sign language. However, recognizing signs from videos is a challenging task as the meaning of a word depends on a combination of subtle body motions, hand configurations and other movements. Recent pose-based architectures for WSLR either model both the spatial and temporal dependencies among the poses in different frames simultaneously or only model the temporal information without fully utilizing the spatial information.We tackle the problem of WSLR using a novel pose-based approach, which captures spatial and temporal information separately and performs late fusion. Our proposed architecture explicitly captures the spatial interactions in the video using a Graph Convolutional Network (GCN). The temporal dependencies between the frames are captured using Bidirectional Encoder Representations from Transformers (BERT). Experimental results on WLASL, a standard word-level sign language recognition dataset show that our model significantly outperforms the state-of-the-art on pose-based methods by achieving an improvement in the prediction accuracy by up to 5%.

show abstract

Section: Image Appearance Based Methodsmentioning

confidence: 99%

Pose-based Sign Language Recognition using GCN and BERT

Tunga¹,

Vidyaranya²,

Juan³

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…When taking video sequences as as input, typical problems are the classification of actions or motion planning. Here DCNN networks are used for example the VGG [51] architecture [52]), as well as Self Organizing Maps network which in [53] receives information about the human's location and pose in a robot work space based on pressure activated notes in a safety mat.…”

Section: E Research Questionsmentioning

confidence: 99%

Machine Vision in the Context of Robotics: A Systematic Literature Review

Ghofrani,

Kirschne,

Rossburg

et al. 2019

Preprint

View full text Add to dashboard Cite

Machine vision is critical to robotics due to a wide range of applications which rely on input from visual sensors such as autonomous mobile robots and smart production systems. To create the smart homes and systems of tomorrow, an overview about current challenges in the research field would be of use to identify further possible directions, created in a systematic and reproducible manner. In this work a systematic literature review was conducted covering research from the last 10 years. We screened 172 papers from four databases and selected 52 relevant papers. While robustness and computation time were improved greatly, occlusion and lighting variance are still the biggest problems faced. From the number of recent publications, we conclude that the observed field is of relevance and interest to the research community. Further challenges arise in many areas of the field.

show abstract

“…For action representation, the solutions that have achieved the most success are based upon optical flows [4], point clouds [5], convolutional neural networks (CNN) [6,7] and landmark detection of the main joints of the human body (i.e., skeleton-data) [8]. On the other hand, for action classification, previous attempts vary from random forests [9], to recurrent neural networks (RNN) [10,11] and more recently, graph neural networks (GNN).…”

Section: Introductionmentioning

confidence: 99%

Using BlazePose on Spatial Temporal Graph Convolutional Networks for Action Recognition

Alsawadi¹,

El-kenawy²,

Rio³

2023

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

The ever-growing available visual data (i.e., uploaded videos and pictures by internet users) has attracted the research community's attention in the computer vision field. Therefore, finding efficient solutions to extract knowledge from these sources is imperative. Recently, the BlazePose system has been released for skeleton extraction from images oriented to mobile devices. With this skeleton graph representation in place, a Spatial-Temporal Graph Convolutional Network can be implemented to predict the action. We hypothesize that just by changing the skeleton input data for a different set of joints that offers more information about the action of interest, it is possible to increase the performance of the Spatial-Temporal Graph Convolutional Network for HAR tasks. Hence, in this study, we present the first implementation of the BlazePose skeleton topology upon this architecture for action recognition. Moreover, we propose the Enhanced-BlazePose topology that can achieve better results than its predecessor. Additionally, we propose different skeleton detection thresholds that can improve the accuracy performance even further. We reached a top-1 accuracy performance of 40.1% on the Kinetics dataset. For the NTU-RGB+D dataset, we achieved 87.59% and 92.1% accuracy for Cross-Subject and Cross-View evaluation criteria, respectively.

show abstract

Action Recognition with 3D ConvNet-GRU Architecture

Cited by 14 publications

References 28 publications

Pose-based Sign Language Recognition using GCN and BERT

Pose-based Sign Language Recognition using GCN and BERT

Machine Vision in the Context of Robotics: A Systematic Literature Review

Using BlazePose on Spatial Temporal Graph Convolutional Networks for Action Recognition

Contact Info

Product

Resources

About