Jenny Benois‐Pineau scite author profile

Human action recognition in video is one of the key problems in visual data interpretation. Despite intensive research, the recognition of actions with low inter-class variability remains a challenge. This paper presents a new Siamese Spatio-Temporal Convolutional Neural Network (SSTCNN) for this purpose. When applied to table tennis, it is possible to detect and recognize 20 table tennis strokes. The model has been trained on a specific dataset, so called TTStroke-21, recorded in natural conditions at the Faculty of Sports of the University of Bordeaux. Our model takes as inputs a RGB image sequence and its computed residual Optical Flow. The proposed siamese network architecture comprises 3 spatio-temporal convolutional layers, followed by a fully connected layer where data are fused. Our method reaches an accuracy of 91.4% against 43.1% for our baseline.

show abstract

Visual vs internal attention mechanisms in deep neural networks for image classification and object detection

Obeso

Benois‐Pineau

Vázquez

et al. 2022

Pattern Recognition

View full text Add to dashboard Cite

3D CNN-based classification using sMRI and MD-DTI images for Alzheimer disease studies

Khvostikov¹,

Aderghal²,

Benois‐Pineau³

et al. 2018

Preprint

View full text Add to dashboard Cite

Computer-aided early diagnosis of Alzheimers Disease (AD) and its prodromal form, Mild Cognitive Impairment (MCI), has been the subject of extensive research in recent years. Some recent studies have shown promising results in the AD and MCI determination using structural and functional Magnetic Resonance Imaging (sMRI, fMRI), Positron Emission Tomography (PET) and Diffusion Tensor Imaging (DTI) modalities. Furthermore, fusion of imaging modalities in a supervised machine learning framework has shown promising direction of research.In this paper we first review major trends in automatic classification methods such as feature extraction based methods as well as deep learning approaches in medical image analysis applied to the field of Alzheimer's Disease diagnostics. Then we propose our own algorithm for Alzheimer's Disease diagnostics based on a convolutional neural network and sMRI and DTI modalities fusion on hippocampal ROI using data from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (http://adni. loni.usc.edu). Comparison with a single modality approach shows promising results. We also propose our own method of data augmentation for balancing classes of different size and analyze the impact of the ROI size on the classification results as well.

show abstract

Shoulder kinematics plus contextual target information enable control of multiple distal joints of a simulated prosthetic arm and hand

Mick

Segas

Dure

et al. 2021

J NeuroEngineering Rehabil

View full text Add to dashboard Cite

Background Prosthetic restoration of reach and grasp function after a trans-humeral amputation requires control of multiple distal degrees of freedom in elbow, wrist and fingers. However, such a high level of amputation reduces the amount of available myoelectric and kinematic information from the residual limb. Methods To overcome these limits, we added contextual information about the target’s location and orientation such as can now be extracted from gaze tracking by computer vision tools. For the task of picking and placing a bottle in various positions and orientations in a 3D virtual scene, we trained artificial neural networks to predict postures of an intact subject’s elbow, forearm and wrist (4 degrees of freedom) either solely from shoulder kinematics or with additional knowledge of the movement goal. Subjects then performed the same tasks in the virtual scene with distal joints predicted from the context-aware network. Results Average movement times of 1.22s were only slightly longer than the naturally controlled movements (0.82 s). When using a kinematic-only network, movement times were much longer (2.31s) and compensatory movements from trunk and shoulder were much larger. Integrating contextual information also gave rise to motor synergies closer to natural joint coordination. Conclusions Although notable challenges remain before applying the proposed control scheme to a real-world prosthesis, our study shows that adding contextual information to command signals greatly improves prediction of distal joint angles for prosthetic control.

show abstract

Features Understanding in 3D CNNs for Actions Recognition in Video

Fuad

Martin

Giot

et al. 2020

View full text Add to dashboard Cite

Human Action Recognition is one of the key tasks in video understanding. Deep Convolutional Neural Networks (CNN) are often used for this purpose. Although they usually perform impressively, their decision interpretation remains challenging. We propose a novel visual CNN features understanding technique. Its objective is to find salient features that played a key role in decision making of the network. The technique only uses the features from the last convolutional layer before the fully connected layers of a trained model and builds an importance map of features. The map is propagated to the original frame thus highlighting the regions in them that contribute to the final decision. The method is fast as it does not require gradient computation as many state-of-the-art methods do. Proposed technique is applied to the Twin Spatio-Temporal 3D Convolutional Neural Network (TSTCNN), designed for Table Tennis Actions recognition. Features visualization is performed at the RGB and Optical flow branches of the network. Obtained results are compared to other visualization techniques both in terms of human understanding and similarity metrics. The metrics show that generated maps are similar to those obtained with known Grad-CAM method, e.g. Pearson Correlation Coefficient between the maps generated of RGB data for Grad-CAM and our method is 0.7 ± 0.05 and 0.72 ± 0.06 on Optical Flow data.

show abstract

Optimal Choice of Motion Estimation Methods for Fine-Grained Action Classification with 3D Convolutional Networks

Martin¹,

Benois‐Pineau²,

Péteri³

et al. 2019

View full text Add to dashboard Cite

Detecting and classifying human actions in videos is one of the current challenges in visual content analysis and mining. This paper presents a method for performing a finegrained classification of sport actions using a Siamese Spatio-Temporal Convolutional Neural Network (SSTCNN) model. This model takes RGB images and Optical Flow field as input data. Our first contribution is the comparison of different Optical flow methods and a study of their influence on the classification score. We also present different normalization methods for the optical flow that drastically impact results, boosting performances from 44% to 74% of accuracy. Our second contribution is the detection and classification of actions in videos performed using a sliding temporal window. It leads to a satisfying score of 81.3% over the whole dataset TTStroke-21.

show abstract

Deep Learning in Mining of Visual Content

Zemmari

Benois‐Pineau

2020

View full text Add to dashboard Cite

the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.