2021
DOI: 10.1109/jsen.2021.3062261
|View full text |Cite
|
Sign up to set email alerts
|

Inertial Sensor Data to Image Encoding for Human Action Recognition

Abstract: Convolutional Neural Networks (CNNs) are successful deep learning models in the field of computer vision. To get the maximum advantage of CNN model for Human Action Recognition (HAR) using inertial sensor data, in this paper, we use four types of spatial domain methods for transforming inertial sensor data to activity images, which are then utilized in a novel fusion framework. These four types of activity images are Signal Images (SI), Gramian Angular Field (GAF) Images, Markov Transition Field (MTF) Images a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 28 publications
(4 citation statements)
references
References 48 publications
(74 reference statements)
0
4
0
Order By: Relevance
“…Then, they presented a fusion ResNet framework, which learned the generated GAF image pixels correspondences between acceleration and angular velocity features. Almost similar work was done by the authors in [ 59 ]. Contrary to the previous work, they used four different types of activity images and made each one multimodal by convolving it with two spatial domain filters: the Prewitt filter and the high-boost filter.…”
Section: Related Workmentioning
confidence: 55%
“…Then, they presented a fusion ResNet framework, which learned the generated GAF image pixels correspondences between acceleration and angular velocity features. Almost similar work was done by the authors in [ 59 ]. Contrary to the previous work, they used four different types of activity images and made each one multimodal by convolving it with two spatial domain filters: the Prewitt filter and the high-boost filter.…”
Section: Related Workmentioning
confidence: 55%
“…Even though depth datasets contain one-channel information, they provide object structure and background information with two different colors, making them confusing and time-consuming to process for lightweight devices. Inertial datasets [ 15 ] provide more compact information for human action. They contain acceleration and gyroscope information along the x-, y-, and z-axes, which can be processed easily.…”
Section: Introductionmentioning
confidence: 99%
“…To address these issues, some studies have demonstrated that fusing multiple images (i.e., multimodal fusion) can achieve higher performance than using a single image [39,40]. However, how to fuse multimodal features effectively is still challenging.…”
Section: Introductionmentioning
confidence: 99%