Pain sensation is essential for survival, since it draws attention to physical threat to the body. Pain assessment is usually done through self-reports. However, self-assessment of pain is not available in the case of noncommunicative patients, and therefore, observer reports should be relied upon. Observer reports of pain could be prone to errors due to subjective biases of observers. Moreover, continuous monitoring by humans is impractical. Therefore, automatic pain detection technology could be deployed to assist human caregivers and complement their service, thereby improving the quality of pain management, especially for noncommunicative patients. Facial expressions are a reliable indicator of pain, and are used in all observer-based pain assessment tools. Following the advancements in automatic facial expression analysis, computer vision researchers have tried to use this technology for developing approaches for automatically detecting pain from facial expressions. This paper surveys the literature published in this field over the past decade, categorizes it, and identifies future research directions. The survey covers the pain datasets used in the reviewed literature, the learning tasks targeted by the approaches, the features extracted from images and image sequences to represent pain-related information, and finally, the machine learning methods used.
It is only a matter of time until autonomous vehicles become ubiquitous; however, human driving supervision will remain a necessity for decades. To assess the driver's ability to take control over the vehicle in critical scenarios, driver distractions can be monitored using wearable sensors or sensors that are embedded in the vehicle, such as video cameras. The types of driving distractions that can be sensed with various sensors is an open research question that this study attempts to answer. This study compared data from physiological sensors (palm electrodermal activity (pEDA), heart rate and breathing rate) and visual sensors (eye tracking, pupil diameter, nasal EDA (nEDA), emotional activation and facial action units (AUs)) for the detection of four types of distractions. The dataset was collected in a previous driving simulation study. The statistical tests showed that the most informative feature/modality for detecting driver distraction depends on the type of distraction, with emotional activation and AUs being the most promising. The experimental comparison of seven classical machine learning (ML) and seven end-to-end deep learning (DL) methods, which were evaluated on a separate test set of 10 subjects, showed that when classifying windows into distracted or not distracted, the highest F1-score of 79% was realized by the extreme gradient boosting (XGB) classifier using 60-second windows of AUs as input. When classifying complete driving sessions, XGB's F1-score was 94%. The best-performing DL model was a spectro-temporal ResNet, which realized an F1-score of 75% when classifying segments and an F1-score of 87% when classifying complete driving sessions. Finally, this study identified and discussed problems, such as label jitter, scenario overfitting and unsatisfactory generalization performance, that may adversely affect related ML approaches.
Deep neural networks are successfully used for object and face recognition in images and videos. In order to be able to apply such networks in practice, for example in hospitals as a pain recognition tool, the current procedures are only suitable to a limited extent. The advantage of deep neural methods is that they can learn complex non-linear relationships between raw data and target classes without limiting themselves to a set of hand-crafted features provided by humans. However, the disadvantage is that due to the complexity of these networks, it is not possible to interpret the knowledge that is stored inside the network. It is a black-box learning procedure. Explainable Artificial Intelligence (AI) approaches mitigate this problem by extracting explanations for decisions and representing them in a human-interpretable form. The aim of this paper is to investigate the explainable AI methods Layer-wise Relevance Propagation (LRP) and Local Interpretable Model-agnostic Explanations (LIME). These approaches are applied to explain how a deep neural network distinguishes facial expressions of pain from facial expressions of emotions such as happiness and disgust.
A wavelet-based multiview video coding scheme is presented in this paper. It uses a 4-D wavelet transform, which is composed of a 1-D temporal wavelet transform, namely motion compensated temporal filtering, a 1-D view-directional wavelet transform, namely disparity compensated view filtering and a 2-D spatial wavelet transform. Since the presented framework can make use of the inherent scalability properties of the wavelet transforms involved, it allows full scalability of the coded bitstream in the temporal, view, spatial, and quality dimensions. Coding performance close to the H.264/advanced video coding based standard multiview video codec is shown. Enhancements of the view transform, in order to better account for brightness and color variations across views are introduced. Additionally, the use of a signal adaptive anisotropic wavelet packet (WP) transform as a generalization of WP transforms for the spatial decomposition is proposed. Both enhancements lead to a decrease of bit rate of up to 11% compared with the baseline version of the codec.
This paper presents a technique for the efficient compression of high dynamic range video (HDR) sequences. Such video sequences usually represent several orders of magnitude of real-world luminance intensity levels. Therefore, they are mostly stored in a floating-point represention. In order to obtain a coded representation that is bit stream compatible with the H.264/AVC video coding standard, the float-valued HDR values have to be mapped to a suitable integer representation first. The mapping proposed in this paper is adapted to the dynamic range of each video frame. Furthermore, to compensate for the associated dynamic contrast variation across frames, a weighted prediction method and quantization adaptation are introduced. The experiments show that the proposed method offers highly efficient HDR video compression. Only a fraction of the bit rate of a non-adaptive reference method is required to represent an HDR video sequence at the same quality
Over the last few decades, there has been an increasing call in the field of computer vision to use machinelearning techniques for the detection, categorization, and indexing of facial behaviors, as well as for the recognition of emotion phenomena. Automated Facial Expression Analysis has become a highly attractive field of competition for academic laboratories, startups and large technology corporations. This paper introduces the new Actor Study Database to address the resulting need for reliable benchmark datasets. The focus of the database is to provide real multi-view data, that is not synthesized through perspective distortion. The database contains 68-minutes of highquality videos of facial expressions performed by 21 actors. The videos are synchronously recorded from five different angles. The actors' tasks ranged from displaying specific Action Units and their combinations at different intensities to enactment of a variety of emotion scenarios. Over 1.5 million frames have been annotated and validated with the Facial Action Coding System by certified FACS coders. These attributes make the Actor Study Database particularly applicable in machine recognition studies as well as in psychological research into affective phenomena-whether prototypical basic emotions or subtle emotional responses. Two state-of-the-art systems were used to produce benchmark results for all five different views that this new database encompasses. The database is publicly available for non-commercial research.
We present a backwards compatible high dynamic range video coding framework based on H. 264/AVC. It allows to extract a standard low dynamic range (LDR) as well as high dynamic range (HDR) video from one compressed bit stream. A joint global and local inter-layer prediction method is proposed to reduce the redundancy between the LDR and HDR layers. It is based on a common color space which can represent HDR video data perceptually lossless. We show how the inter-layer prediction parameters can be estimated in a rate-distortion optimized way and efficiently encoded to reduce side information. Our evaluations demonstrate that the proposed framework performs best, compared to the state-of-the-art for arbitrary tone-mapping operators. W.r.t. simulcast it allows for up to 50% bit rate saving
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.