Hierarchical Attention Network for Action Segmentation

Gammulle, Harshala; Denman, Simon; Sridharan, Sridha; Fookes, Clinton

doi:10.1016/j.patrec.2020.01.023

Cited by 5 publications

(1 citation statement)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These ResNet models are less complex (in terms of parameters) compared to previous pre-trained networks such as VGG networks, where even the deeper ResNet network (i.e. ResNet152) is less complex (11.3 Due to the aforementioned advantages, pre-trained ResNet networks have been widely used as a feature extraction backbone within both the action recognition domain [144], [145], [146], [46], [147] and related problem domains [148], [149], [150].…”

Section: Appendix a Feature Extractionmentioning

confidence: 99%

Continuous Human Action Recognition for Human-Machine Interaction: A Review

Gammulle¹,

Ahmedt-Aristizabal²,

Denman³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

With advances in data-driven machine learning research, a wide variety of prediction models have been proposed to capture spatio-temporal features for the analysis of video streams. Recognising actions and detecting action transitions within an input video are challenging but necessary tasks for applications that require real-time human-machine interaction. By reviewing a large body of recent related work in the literature, we thoroughly analyse, explain and compare action segmentation methods and provide details on the feature extraction and learning strategies that are used on most state-of-the-art methods. We cover the impact of the performance of object detection and tracking techniques on human action segmentation methodologies. We investigate the application of such models to real-world scenarios and discuss several limitations and key research directions towards improving interpretability, generalisation, optimisation and deployment.

show abstract