The choice of image feature representation plays a crucial role in the analysis of visual information. Although vast numbers of alternative robust feature representation models have been proposed to improve the performance of different visual tasks, most existing feature representations (e.g. handcrafted features or Convolutional Neural Networks (CNN)) have a relatively limited capacity to capture the highly orientationinvariant (rotation/reversal) features. The net consequence is suboptimal visual performance. To address these problems, this study adopts a novel transformational approach, which investigates the potential of using polar feature representations. Our low level consists of a histogram of oriented gradient, which is then binned using annular spatial bin-type cells applied to the polar gradient. This gives gradient binning invariance for feature extraction. In this way, the descriptors have significantly enhanced orientation-invariant capabilities. The proposed feature representation, termed orientation-invariant histograms of oriented gradients (Oi-HOG), is capable of accurately processing visual tasks (e.g., facial expression recognition). In the context of the CNN architecture, we propose two polar convolution operations, referred to as Full Polar Convolution (FPolarConv) and Local Polar Convolution (LPolarConv), and use these to develop polar architectures for the CNN orientation-invariant representation. Experimental results show that the proposed orientation-invariant image representation, based on polar models for both handcrafted features and deep learning features, is both competitive with state-of-the-art methods and maintains a compact representation on a set of challenging benchmark image datasets. Index Terms-Rotation-invariant and reversal-invariant representation, HOG, CNN.
We propose a novel and general framework called the multithreading cascade of Speeded Up Robust Features (McSURF), which is capable of processing multiple classifications simultaneously and accurately. The proposed framework adopts SURF features, but the framework is a multi-class and simultaneous cascade, i.e., a multithreading cascade. McSURF is implemented by configuring an area under the receiver operating characteristic (ROC) curve (AUC) of the weak SURF classifier for each data category into a real-value lookup list. These non-interfering lists are built into thread channels to train the boosting cascade for each data category. This boosting cascade-based approach can be trained to fit complex distributions and can simultaneously and robustly process multi-class events. The proposed method takes facial expression recognition as a test case and validates its use on three popular and representative public databases: the Extended Cohn-Kanade, MMI Facial Expression Database, and Annotated Facial Landmarks in the Wild database. Overall results show that this framework outperforms other state-of-the-art methods.
Our research focuses on the question of classifiers that are capable of processing images rapidly and accurately without having to rely on a large-scale dataset, thus presenting a robust classification framework for both facial expression recognition (FER) and object recognition. The framework is based on support vector machines (SVMs) and employs three key approaches to enhance its robustness. First, it uses the perturbed subspace method (PSM) to extend the range of sample space for task sample training, which is an effective way to improve the robustness of a training system. Second, the framework adopts Speeded Up Robust Features (SURF) as features, which is more suitable for dealing with real-time situations. Third, it introduces region attributes to evaluate and revise the classification results based on SVMs. In this way, the classifying ability of SVMs can be improved. Combining these approaches, the proposed method has the following beneficial contributions. First, the efficiency of SVMs can be improved. Experiments show that the proposed approach is capable of reducing the number of samples effectively, resulting in an obvious reduction in training time. Second, the recognition accuracy is comparable to that of state-of-the-art algorithms. Third, its versatility is excellent, allowing it to be applied not only to object recognition but also FER.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.