We propose new sets of Fourier-Mellin descriptors for color images. They are constructed using the Clifford Fourier Transform of Batard et al. (2010) and are an extension of the classical Fourier-Mellin descriptors for grayscale images. These are invariant under direct similarity transformations (translations, rotations, scale) and marginal treatment of colors images is avoided. An implementation of these features is given and the choice of the bivector (a distinguished color plane which parameterizes the Clifford Fourier Transform) is discussed. The proposed formalism extends and clarifies the notion of direction of analysis as introduced for the quaternionic Fourier-Mellin moments (Guo and Zhu, 2011). Thus, another set of descriptors invariant under this parameter is defined. Our proposals are tested with the purpose of object recognition on well-known color image databases. Their retrieval rates are favourably compared to standard feature descriptors.
The aim of this paper is to propose two different approaches for color object recognition, both using the recently defined color Clifford Fourier transform. The first one deals with so-called Generalized Fourier Descriptors, the definition of which relies on plane motion group actions. The proposed color extension leads to more compact descriptors, with lower complexity and better recognition rates, than the already existing descriptors based on the processing of the r,g and b channels separately (later referred as marginal processing). The second approach concerns color phase correlation for color images. The idea here is to generalize in the Clifford framework the usual means of measuring correlation from the well-known shift theorem. Both methods necessitate to choose a bivector B of R 4,0 which corresponds to an analysis plane in the color space. The relevance of proposed methods for classification purposes is discussed on several color image database. In particular, the influence of parameter B is studied regarding the type of images.
Although much progress has been made in the facial expression analysis field, facial occlusions are still challenging. The main innovation brought by this contribution consists in exploiting the specificities of facial movement propagation for recognizing expressions in presence of important occlusions. The movement induced by an expression extends beyond the movement epicenter. Thus, the movement occurring in an occluded region propagates towards neighboring visible regions. In presence of occlusions, per expression, we compute the importance of each unoccluded facial region and we construct adapted facial frameworks that boost the performance of per expression binary classifier. The output of each expression-dependant binary classifier is then aggregated and fed into a fusion process that aims constructing, per occlusion, a unique model that recognizes all the facial expressions considered. The evaluations highlight the robustness of this approach in presence of significant facial occlusions.
This article relies on two recent developments of well known methods which are a color Fourier transform using geometric algebra [1] and Generalized Fourier descriptors defined from the group M 2 of the motion of the plane [2]. In this paper, new generalized color Fourier descriptors (GCF D) are proposed. They depend on the choice of a bivector B acting as an analysis plane in a colorimetric space. The relevance of proposed descriptors is discussed on several color image databases. In particular, the influence of parameter B is studied regarding the type of images. It appears that proposed descriptors are more compact with a lower complexity and better classification rate.
Many research works focus on leveraging the complementary geometric information of indoor depth sensors in vision tasks performed by deep convolutional neural networks, notably semantic segmentation. These works deal with a specific vision task known as "RGB-D Indoor Semantic Segmentation". The challenges and resulting solutions of this task differ from its standard RGB counterpart. This results in a new active research topic. The objective of this paper is to introduce the field of Deep Convolutional Neural Networks for RGB-D Indoor Semantic Segmentation. This review presents the most popular public datasets, proposes a categorization of the strategies employed by recent contributions, evaluates the performance of the current state-of-the-art, and discusses the remaining challenges and promising directions for future works.
With the advent of neuromorphic hardware, spiking neural networks can be a good energy-efficient alternative to artificial neural networks. However, the use of spiking neural networks to perform computer vision tasks remains limited, mainly focusing on simple tasks such as digit recognition. It remains hard to deal with more complex tasks (e.g. segmentation, object detection) due to the small number of works on deep spiking neural networks for these tasks. The objective of this paper is to make the first step towards modern computer vision with supervised spiking neural networks. We propose a deep convolutional spiking neural network for the localization of a single object in a grayscale image. We propose a network based on DECOLLE, a spiking model that enables local surrogate gradient-based learning. The encouraging results reported on Oxford-IIIT-Pet validates the exploitation of spiking neural networks with a supervised learning approach for more elaborate vision tasks in the future.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.