Depression is a state of low mood and aversion to activity that can affect a person's thoughts, behavior, feelings and sense of well-being. In such a low mood, both the facial expression and voice appear different from the ones in normal states. In this paper, an automatic system is proposed to predict the scales of Beck Depression Inventory from naturalistic facial expression of the patients with depression. Firstly, features are extracted from corresponding video and audio signals to represent characteristics of facial and vocal expression under depression. Secondly, dynamic features generation method is proposed in the extracted video feature space based on the idea of Motion History Histogram (MHH) for 2-D video motion extraction. Thirdly, Partial Least Squares (PLS) and Linear regression are applied to learn the relationship between the dynamic features and depression scales using training data, and then predict the depression scale for unseen ones. Finally, decision level fusion was done for combining predictions from both video and audio modalities. The proposed approach is evaluated on the AVEC2014 dataset and experimental results demonstrate its effectiveness.
X-ray baggage security screening is widely used to maintain aviation and transport secure. Of particular interest is the focus on automated security X-ray analysis for particular classes of object such as electronics, electrical items and liquids. However, manual inspection of such items is challenging when dealing with potentially anomalous items. Here we present a dual convolutional neural network (CNN) architecture for automatic anomaly detection within complex security X-ray imagery. We leverage recent advances in region-based (R-CNN), mask-based CNN (Mask R-CNN) and detection architectures such as Reti-naNet to provide object localisation variants for specific object classes of interest. Subsequently, leveraging a range of established CNN object and fine-grained category classification approaches we formulate within object anomaly detection as a two-class problem (anomalous or benign). Whilst the best performing object localisation method is able to perform with 97.9% mean average precision (mAP) over a six-class X-ray object detection problem, subsequent two-class anomaly/benign classification is able to achieve 66% performance for within object anomaly detection. Overall, this performance illustrates both the challenge and promise of object-wise anomaly detection within the context of cluttered X-ray security imagery.
Touch is a primary nonverbal communication channel used to communicate emotions or other social messages. A variety of social touch exists including hugging, rubbing and punching. Despite its importance, this channel is still very little explored in the affective computing field, as much more focus has been placed on visual and aural channels. In this paper, we investigate the possibility to automatically discriminate between different social touch types. We propose five distinct feature sets for describing touch behaviours captured by a grid of pressure sensors. These features are then combined together by using the Random Forest and Boosting methods for categorizing the touch gesture type. The proposed methods were evaluated on both the HAART (7 gesture types over different surfaces) and the CoST (14 gesture types over the same surface) datasets made available by the Social Touch Gesture Challenge 2015. Well above chance level performances were achieved with a 67% accuracy for the HAART and 59% for the CoST testing datasets respectively.
X-ray imagery security screening is essential to maintaining transport security against a varying profile of threat or prohibited items. Particular interest lies in the automatic detection and classification of weapons such as firearms and knives within complex and cluttered X-ray security imagery. Here, we address this problem by exploring various end-to-end object detection Convolutional Neural Network (CNN) architectures. We evaluate several leading variants spanning the Faster R-CNN, Mask R-CNN, and RetinaNet architectures to explore the transferability of such models between varying X-ray scanners with differing imaging geometries, image resolutions and material colour profiles. Whilst the limited availability of X-ray threat imagery can pose a challenge, we employ a transfer learning approach to evaluate whether such inter-scanner generalisation may exist over a multiple class detection problem. Overall, we achieve maximal detection performance using a Faster R-CNN architecture with a ResNet101 classification network, obtaining 0.88 and 0.86 of mean Average Precision (mAP) for a three-class and two class item from varying X-ray imaging sources. Our results exhibit a remarkable degree of generalisability in terms of cross-scanner performance (mAP: 0.87, firearm detection: 0.94 AP). In addition, we examine the inherent adversarial discriminative capability of such networks using a specifically generated adversarial dataset for firearms detection-with a variable low false positive, as low as 5%, this shows both the challenge and promise of such threat detection within X-ray security imagery.
With the rapid development of augmented reality (AR) and virtual reality (VR) technology, human-computer interaction (HCI) has been greatly improved for gaming interaction of AR and VR control. The finger micro-gesture is one of the important interactive methods for HCI applications such as in the Google Soli and Microsoft Kinect projects. However, the progress in this research is slow due to the lack of high quality public available database. In this paper, holoscopic 3D camera is used to capture high quality micro-gesture images and a new unique holoscopic 3D micro-gesture (HoMG) database is produced. The principle of the holoscopic 3D camera is based on the flys viewing system to see the objects. HoMG database recorded the image sequence of 3 conventional gestures from 40 participants under different settings and conditions. For the purpose of micro-gesture recognition, HoMG has a video subset with 960 videos and a still image subset with 30635 images. Initial micro-gesture recognition on both subsets has been conducted using traditional 2D image and video features and popular classifiers and some encouraging performance has been achieved. The database will be available for the research communities and speed up the research in this area.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.