2021
DOI: 10.1007/s11042-021-10633-5
|View full text |Cite
|
Sign up to set email alerts
|

Attention-based encoder-decoder networks for workflow recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 34 publications
0
3
0
Order By: Relevance
“…In the context of manufacturing, a limited number of papers were found, with notable works being [6,7,[33][34][35][36][37] The application of classic machine learning models was predominant in these papers, where models like Hidden Markov Models [6,34,35] or Support Vector Machines [7] were employed following the manual extraction of features. Makantasis et al [33] applied a deep learning model based on a 2D convolutional neural network and multi-layer perceptron, using manually created features with the Motion History Image algorithm.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…In the context of manufacturing, a limited number of papers were found, with notable works being [6,7,[33][34][35][36][37] The application of classic machine learning models was predominant in these papers, where models like Hidden Markov Models [6,34,35] or Support Vector Machines [7] were employed following the manual extraction of features. Makantasis et al [33] applied a deep learning model based on a 2D convolutional neural network and multi-layer perceptron, using manually created features with the Motion History Image algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…Makantasis et al [33] applied a deep learning model based on a 2D convolutional neural network and multi-layer perceptron, using manually created features with the Motion History Image algorithm. Zhang et al [36] proposed an encoder-decoder framework for workflow recognition using a 3D CNN, transforming the activations of the last convolutional layer into clip-level representations, which were then fed into an LSTM network with an attention mechanism for enhanced recognition. In paper [37], a system comprising three stages was developed: spatial feature extraction using a Vectors Assembly Graph (VAG) and graph networks from RGB-D video frames; contact force feature extraction via a sliding window technique; and action segmentation through a multi-stage temporal convolution network (MS-TCN) that combines these features.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation