Creating large, richly annotated databases depicting real-world or simulated real-world conditions is a challenging task. There has been a long understood need for recognition of human facial expressions in realistic video scenarios. Although many expression databases are available, research has been restrained by their limited scope due to their 'lab controlled' recording environment. This paper proposes a new temporal facial expression database Acted Facial Expressions in the Wild (AFEW) and its static subset Static Facial Expressions in the Wild (SFEW), extracted from movies. As creating databases is time consuming and complex, a novel semi-automatic approach via a recommender system based on subtitles is proposed. Further, experimental protocols based on varying levels of person dependency are defined.AFEW is compared with the extended Cohn-Kanade CK+ database and SFEW with JAFFE and Multi-PIE databases Index TermsFacial expression recognition, large scale database, real-world conditions, emotion database
The problem of data encoding and feature selection for training back-propagation neural networks is well known. The basic principles are to avoid encrypting the underlying structure of the data, and to avoid using irrelevant inputs. This is not easy in the real world, where we often receive data which has been processed by at least one previous user. The data may contain too many instances of some class, and too few instances of other classes. Real data sets often include many irrelevant or redundant input elds. This paper examines the use of weight matrix analysis techniques and functional measures using two real (and hence noisy) data sets.The rst part of this paper examines the use of the weight matrix of the trained neural network itself to determine which inputs are signicant. A new technique is introduced and compared with two other techniques from the literature. We present our experience and results on some satellite data augmented by a terrain model. The task was to predict the forest supra-type based on the available information. A brute force technique eliminating randomly selected inputs was used to validate our approach.The second part of this paper examines the use of measures to determine the functional contribution of inputs to outputs. Inputs which include minor but unique information to the network are more signicant than inputs with higher magnitude contribution but providing redundant information, which is also provided by another input. A comparison is made to sensitivity analysis, where the sensitivity o f outputs to input perturbation is used as a measure of the signicance of inputs.This paper presents a novel functional analysis of the weight matrix based on a technique developed for determining the behavioral signicance of hidden neurons. This is compared with the application of the same technique to the training and test data. Finally, a n o v el aggregation technique is introduced.
A problem with gradient descent algorithms is that they can converge to poorly performing local minima. Global optimization algorithms address this problem, but at the cost of greatly increased training times. This work examines combining gradient descent with the global optimization technique of simulated annealing (SA). Simulated annealing in the form of noise and weight decay is added to resiliant backpropagation (RPROP), a powerful gradient descent algorithm for training feedforward neural networks. The resulting algorithm, SARPROP, is shown through various simulations not only to be able to escape local minima, but is also able to maintain, and often improve the training times of the RPROP algorithm. In addition, SARPROP may be used with a restart training phase which allows a more thorough search of the error surface and provides an automatic annealing schedule.
In recent years, searching the web on mobile devices has become enormously popular. Because mobile devices have relatively small screens and show fewer search results, search behavior with mobile devices may be different from that with desktops or laptops. Therefore, examining these differences may suggest better, more efficient designs for mobile search engines. In this experiment, we use eye tracking to explore user behavior and performance. We analyze web searches with 2 task types on 2 differently sized screens: one for a desktop and the other for a mobile device. In addition, we examine the relationships between search performance and several search behaviors to allow further investigation of the differences engendered by the screens. We found that users have more difficulty extracting information from search results pages on the smaller screens, although they exhibit less eye movement as a result of an infrequent use of the scroll function. However, in terms of search performance, our findings suggest that there is no significant difference between the 2 screens in time spent on search results pages and the accuracy of finding answers. This suggests several possible ideas for the presentation design of search results pages on small devices.
This paper deals with the approximation behaviour of soft computing techniques. First, we give a survey of the results of universal approximation theorems achieved so far in various soft computing areas, mainly in fuzzy control and neural networks. We point out that these techniques have common approximation behaviour in the sense that an arbitrary function of a certain set of functions (usually the set of continuous function, C) can be approximated with arbitrary accuracy e on a compact domain. The drawback of these results is that one needs unbounded numbers of ''building blocks'' (i.e. fuzzy sets or hidden neurons) to achieve the prescribed e accuracy. If the number of building blocks is restricted, it is proved for some fuzzy systems that the universal approximation property is lost, moreover, the set of controllers with bounded number of rules is nowhere dense in the set of continuous functions. Therefore it is reasonable to make a trade-off between accuracy and the number of the building blocks, by determining the functional relationship between them. We survey this topic by showing the results achieved so far, and its inherent limitations. We point out that approximation rates, or constructive proofs can only be given if some characteristic of smoothness is known about the approximated function.
Face recognition in real-world conditions requires the ability to deal with a number of conditions, such as variations in pose, illumination and expression. In this paper, we focus on variations in head pose and use a computationally efficient regression-based approach for synthesising face images in different poses, which are used to extend the face recognition training set. In this data-driven approach, the correspondences between facial landmark points in frontal and non-frontal views are learnt offline from manually annotated training data via Gaussian Process Regression. We then use this learner to synthesise non-frontal face images from any unseen frontal image. To demonstrate the utility of this approach, two frontal face recognition systems (the commonly used PCA and the recent Multi-Region Histograms) are augmented with synthesised non-frontal views for each person. This synthesis and augmentation approach is experimentally validated on the FERET dataset, showing a considerable improvement in recognition rates for ±40 • and ±60 • views, while maintaining high recognition rates for ±15 • and ±25 • views.
Stress is a serious concern facing our world today, motivating the development of a better objective understanding through the use of non-intrusive means for stress recognition by reducing restrictions to natural human behavior. As an initial step in computer vision-based stress detection, this paper proposes a temporal thermal spectrum (TS) and visible spectrum (VS) video database ANUStressDB -a major contribution to stress research. The database contains videos of 35 subjects watching stressed and not-stressed film clips validated by the subjects. We present the experiment and the process conducted to acquire videos of subjects' faces while they watched the films for the ANUStressDB. Further, a baseline model based on computing local binary patterns on three orthogonal planes (LBP-TOP) descriptor on VS and TS videos for stress detection is presented. A LBP-TOP-inspired descriptor was used to capture dynamic thermal patterns in histograms (HDTP) which exploited spatio-temporal characteristics in TS videos. Support vector machines were used for our stress detection model. A genetic algorithm was used to select salient facial block divisions for stress classification and to determine whether certain regions of the face of subjects showed better stress patterns. Results showed that a fusion of facial patterns from VS and TS videos produced statistically significantly better stress recognition rates than patterns from VS or TS videos used in isolation. Moreover, the genetic algorithm selection method led to statistically significantly better stress detection rates than classifiers that used all the facial block divisions. In addition, the best stress recognition rate was obtained from HDTP features fused with LBP-TOP features for TS and VS videos using a hybrid of a genetic algorithm and a support vector machine stress detection model. The model produced an accuracy of 86%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.