We present an efficient approach for leveraging the knowledge from multiple modalities in training unimodal 3D convolutional neural networks (3D-CNNs) for the task of dynamic hand gesture recognition. Instead of explicitly combining multimodal information, which is commonplace in many state-of-the-art methods, we propose a different framework in which we embed the knowledge of multiple modalities in individual networks so that each unimodal network can achieve an improved performance. In particular, we dedicate separate networks per available modality and enforce them to collaborate and learn to develop networks with common semantics and better representations. We introduce a "spatiotemporal semantic alignment" loss (SSA) to align the content of the features from different networks. In addition, we regularize this loss with our proposed "focal regularization parameter" to avoid negative knowledge transfer. Experimental results show that our framework improves the test time recognition accuracy of unimodal networks, and provides the state-of-the-art performance on various dynamic hand gesture recognition datasets.
Exemplar-based learning or, equally, nearest neighbor methods have recently gained interest from researchers in a variety of computer science domains because of the prevalence of large amounts of accessible data and storage capacity. In computer vision, these types of technique have been successful in several problems such as scene recognition, shape matching, image parsing, character recognition, and object detection. Applying the concept of exemplar-based learning to the problem of color constancy seems odd at first glance since, in the first place, similar nearest neighbor images are not usually affected by precisely similar illuminants and, in the second place, gathering a dataset consisting of all possible real-world images, including indoor and outdoor scenes and for all possible illuminant colors and intensities, is indeed impossible. In this paper, we instead focus on surfaces in the image and address the color constancy problem by unsupervised learning of an appropriate model for each training surface in training images. We find nearest neighbor models for each surface in a test image and estimate its illumination based on comparing the statistics of pixels belonging to nearest neighbor surfaces and the target surface. The final illumination estimation results from combining these estimated illuminants over surfaces to generate a unique estimate. We show that it performs very well, for standard datasets, compared to current color constancy algorithms, including when learning based on one image dataset is applied to tests from a different dataset. The proposed method has the advantage of overcoming multi-illuminant situations, which is not possible for most current methods since they assume the color of the illuminant is constant all over the image. We show a technique to overcome the multiple illuminant situation using the proposed method and test our technique on images with two distinct sources of illumination using a multiple-illuminant color constancy dataset. The concept proposed here is a completely new approach to the color constancy problem and provides a simple learning-based framework.
We address the problem of scene depth recovery within cross-spectral stereo imagery (each image sensed over a differing spectral range). We compare several robust matching techniques which are able to capture local similarities between the structure of crossspectral images and a range of stereo optimisation techniques for the computation of valid depth estimates in this case. Specifically we deal with the recovery of dense depth information from thermal (far infrared spectrum) and optical (visible spectrum) image pairs where large differences in the characteristics of image pairs make this task significantly more challenging than the common stereo case. We show that the use of dense gradient features, based on Histograms of Oriented Gradient (HOG) descriptors, for pixel matching in combination with a strong match optimisation approach can produce largely valid, yet coarse, dense depth estimates suitable for object localisation or environment navigation. The proposed solution is compared and shown to work favourably against prior approaches based on using Mutual Information (MI) or Local Self-Similarity (LSS) descriptors.
Abstract. Identification of illumination is an important problem in imaging. In this paper we present a new and effective physics-based colour constancy algorithm which makes use of a novel log-relativechromaticity planar constraint. We call the new feature the Zeta-image. We show that this new feature is tied to a novel application of the Kullback-Leibler Divergence, here applied to chromaticity values instead of probabilities. The new method requires no training data or tunable parameters. Moreover it is simple to implement and very fast. Our experimental results across datasets of real images show the proposed method significantly outperforms other unsupervised methods while its estimation accuracy is comparable with more complex, supervised, methods. As well, the new planar constraint can be used as a post-processing stage for any candidate colour constancy method in order to improve its accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.