We describe a new method for remote emotional state assessment using multispectral face videos, and present our findings: unique transdermal, cardiovascular and spatiotemporal facial patterns associated with different emotional states. The method does not rely on stereotypical facial expressions but utilizes different wavelength sensitivities (visible spectrum, near-infrared, and long-wave infrared) to gauge correlates of autonomic nervous system activity spatially and temporally distributed across the human face (e.g., blood flow, hemoglobin concentration, and temperature). We conducted an experiment where 110 participants viewed 150 short emotion-eliciting videos and reported their emotional experience, while three cameras recorded facial videos with multiple wavelengths. Spatiotemporal multispectral features from the multispectral videos were used as inputs to a machine learning model that was able to classify participants’ emotional state (i.e., amusement, disgust, fear, sexual arousal, or no emotion) with satisfactory results (average ROC AUC score of 0.75), while providing feature importance analysis that allows the examination of facial occurrences per emotional state. We discuss findings concerning the different spatiotemporal patterns associated with different emotional states as well as the different advantages of the current method over existing approaches to emotion detection.
Traditional image signal processors (ISPs) are primarily designed and optimized to improve the image quality perceived by humans. However, optimal perceptual image quality does not always translate into optimal performance for computer vision applications. We propose a set of methods, which we collectively call VisionISP, to repurpose the ISP for machine consumption. VisionISP significantly reduces data transmission needs by reducing the bit-depth and resolution while preserving the relevant information. The blocks in VisionISP are simple, content-aware, and trainable. Experimental results show that VisionISP boosts the performance of a subsequent computer vision system trained to detect objects in an autonomous driving setting. The results demonstrate the potential and the practicality of VisionISP for computer vision applications.
Fig. 1. Comparison of ISP output images with different tunings on IMX260 ISO400. From left to right: Not-tuned, Hand-tuned by image quality expert, Auto-tuned.
ABSTRACTImage Signal Processor (ISP) comprises of various blocks to reconstruct image sensor raw data to final image consumed by human visual system or computer vision applications. Each block typically has many tuning parameters due to the complexity of the operation. These need to be hand tuned by Image Quality (IQ) experts, which takes considerable amount of time. In this paper, we present an automatic IQ tuning using nonlinear optimization and automatic reference generation algorithms. The proposed method can produce high quality IQ in minutes as compared with weeks of handtuned results by IQ experts. In addition, the proposed method can work with any algorithms without being aware of their specific implementation. It was found successful on multiple different processing blocks such as noise reduction, demosaic, and sharpening.
Demosaicing is an algorithm used to reconstruct a color image from the incomplete color samples of a color filter array (CFA). Most demosaicing algorithms can be broadly classified into spatial-domain and frequency-domain approaches. Despite significant progress in the past decade, current state of the art demosaicing algorithms still tend to produce artifacts at high-saturation edges. In this paper we propose a new approach to demosaicing -example based. Comparative experimental evaluation shows that example-based demosaicing (EBD) produces visually superior, artifact-free results.
In a typical video conferencing setup, it is hard to maintain eye contact during a call since it requires looking into the camera rather than the display. We propose an eye contact correction model that restores the eye contact regardless of the relative position of the camera and display. Unlike previous solutions, our model redirects the gaze from an arbitrary direction to the center without requiring a redirection angle or camera/display/user geometry as inputs. We use a deep convolutional neural network that inputs a monocular image and produces a vector field and a brightness map to correct the gaze. We train this model in a bi-directional way on a large set of synthetically generated photorealistic images with perfect labels. The learned model is a robust eye contact corrector which also predicts the input gaze implicitly at no additional cost.Our system is primarily designed to improve the quality of video conferencing experience. Therefore, we use a set of control mechanisms to prevent creepy results and to ensure a smooth and natural video conferencing experience. The entire eye contact correction system runs end-to-end in real-time on a commodity CPU and does not require any dedicated hardware, making our solution feasible for a variety of devices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.