Action recognition using saliency learned from recorded human gaze

Stefic, Daria; Patras, Ioannis

doi:10.1016/j.imavis.2016.06.006

Cited by 9 publications

(3 citation statements)

References 31 publications

(56 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thi et al in [39]proposeda method for action classification and localization by representing human action as a complex set of local features. Stefic and Patras in [40] proposed a method for action recognition using saliency learned from recorded human gaze. Instead of using gaze information as side information, they have trained a model that predicts where people look when presented with image sequences.…”

Section: Image and Vision Computingmentioning

confidence: 99%

Handcrafted Localized Phase Features for Human Action Recognition

Hejazi

Abhayaratne

2022

SSRN Journal

View full text Add to dashboard Cite

Section: Image and Vision Computingmentioning

confidence: 99%

Handcrafted Localized Phase Features for Human Action Recognition

Hejazi

Abhayaratne

2022

SSRN Journal

View full text Add to dashboard Cite

“…In recent years, some contributions in the field of action recognition have been proposed, such as in Liu, Xu, Qiu, Qing, and Tao (2016), Shi, Laganière, and Petriu (2016), and Stefic and Patras (2016). A summary report of the state-of-the-arts was discussed and presented in González et al (2015) and Ziaeefard and Bergevin (2015).…”

Section: Related Workmentioning

confidence: 99%

Multiple classifier-based spatiotemporal features for living activity prediction

Hoang

2017

Journal of Information and Telecommunication

View full text Add to dashboard Cite

Nowadays, the action prediction technique plays an important role in many automatic systems. There are some proposed methods for this issue. However, they retain limitations such as accuracy and computational time, especially for applying in limited resource systems. This paper presents an approach to enhance the efficiency of the activity prediction task. The work processes on multiple classifiers using spatiotemporal features based on scalable feature descriptors, such as histogram of oriented gradients (HOG), histogram of oriented optical flow (HOF), and motion boundary histogram (MBH). In order to improve prediction accuracy, two layers of classified machine models are studied for applying on spatiotemporal features with the dynamic foreground extraction process. The first layer based on unsupervised classification is proposed to construct a dictionary of features, which supports for distinguishing and uniform the number of features. The next task is that supervised machine learning supports for final decision of action classes. The proposed approach was evaluated on several benchmark datasets, which are available online. The results demonstrate that the approach enhances accuracy and efficiency of the prediction system. ARTICLE HISTORY

show abstract

“…These images were then used as input to a multilayer CNN which automatically extracted features from the images that were fed in to a multilayer perceptron for classification [ 21 ]. Stefic and Patras utilized CNNs to extract areas of gaze fixation in raw image training data as participants watched videos of multiple activities [ 22 ]. This produced strong results in identifying salient regions of images that were then used for action recognition.…”

Section: Introductionmentioning

confidence: 99%

Feature-Free Activity Classification of Inertial Sensor Data With Machine Vision Techniques: Method, Development, and Evaluation

Veiga

O’Reilly

Whelan

et al. 2017

JMIR Mhealth Uhealth

View full text Add to dashboard Cite

BackgroundInertial sensors are one of the most commonly used sources of data for human activity recognition (HAR) and exercise detection (ED) tasks. The time series produced by these sensors are generally analyzed through numerical methods. Machine learning techniques such as random forests or support vector machines are popular in this field for classification efforts, but they need to be supported through the isolation of a potentially large number of additionally crafted features derived from the raw data. This feature preprocessing step can involve nontrivial digital signal processing (DSP) techniques. However, in many cases, the researchers interested in this type of activity recognition problems do not possess the necessary technical background for this feature-set development.ObjectiveThe study aimed to present a novel application of established machine vision methods to provide interested researchers with an easier entry path into the HAR and ED fields. This can be achieved by removing the need for deep DSP skills through the use of transfer learning. This can be done by using a pretrained convolutional neural network (CNN) developed for machine vision purposes for exercise classification effort. The new method should simply require researchers to generate plots of the signals that they would like to build classifiers with, store them as images, and then place them in folders according to their training label before retraining the network.MethodsWe applied a CNN, an established machine vision technique, to the task of ED. Tensorflow, a high-level framework for machine learning, was used to facilitate infrastructure needs. Simple time series plots generated directly from accelerometer and gyroscope signals are used to retrain an openly available neural network (Inception), originally developed for machine vision tasks. Data from 82 healthy volunteers, performing 5 different exercises while wearing a lumbar-worn inertial measurement unit (IMU), was collected. The ability of the proposed method to automatically classify the exercise being completed was assessed using this dataset. For comparative purposes, classification using the same dataset was also performed using the more conventional approach of feature-extraction and classification using random forest classifiers.ResultsWith the collected dataset and the proposed method, the different exercises could be recognized with a 95.89% (3827/3991) accuracy, which is competitive with current state-of-the-art techniques in ED.ConclusionsThe high level of accuracy attained with the proposed approach indicates that the waveform morphologies in the time-series plots for each of the exercises is sufficiently distinct among the participants to allow the use of machine vision approaches. The use of high-level machine learning frameworks, coupled with the novel use of machine vision techniques instead of complex manually crafted features, may facilitate access to research in the HAR field for individuals without extensive digital signal processing or machine learning ...

show abstract

Action recognition using saliency learned from recorded human gaze

Cited by 9 publications

References 31 publications

Handcrafted Localized Phase Features for Human Action Recognition

Handcrafted Localized Phase Features for Human Action Recognition

Multiple classifier-based spatiotemporal features for living activity prediction

Feature-Free Activity Classification of Inertial Sensor Data With Machine Vision Techniques: Method, Development, and Evaluation

Contact Info

Product

Resources

About