In this paper, we propose a deep multimodal fusion network to fuse multiple modalities (face, iris, and fingerprint) for person identification. The proposed deep multimodal fusion algorithm consists of multiple streams of modality-specific Convolutional Neural Networks (CNNs), which are jointly optimized at multiple feature abstraction levels. Multiple features are extracted at several different convolutional layers from each modality-specific CNN for joint feature fusion, optimization, and classification. Features extracted at different convolutional layers of a modality-specific CNN represent the input at several different levels of abstract representations. We demonstrate that an efficient multimodal classification can be accomplished with a significant reduction in the number of network parameters by exploiting these multi-level abstract representations extracted from all the modality-specific CNNs. We demonstrate an increase in multimodal person identification performance by utilizing the proposed multi-level feature abstract representations in our multimodal fusion, rather than using only the features from the last layer of each modality-specific CNNs. We show that our deep multi-modal CNNs with multimodal fusion at several different feature level abstraction can significantly outperform the unimodal representation accuracy. We also demonstrate that the joint optimization of all the modality-specific CNNs excels the score and decision level fusions of independently optimized CNNs.
Vehicle to Vehicle (V2V) communication has a great potential to improve reaction accuracy of different driver assistance systems in critical driving situations. Cooperative Adaptive Cruise Control (CACC), which is an automated application, provides drivers with extra benefits such as traffic throughput maximization and collision avoidance. CACC systems must be designed in a way that are sufficiently robust against all special maneuvers such as cutting-into the CACC platoons by interfering vehicles or hard braking by leading cars. To address this problem, a Neural-Network (NN)-based cut-in detection and trajectory prediction scheme is proposed in the first part of this paper. Next, a probabilistic framework is developed in which the cut-in probability is calculated based on the output of the mentioned cutin prediction block. Finally, a specific Stochastic Model Predictive Controller (SMPC) is designed which incorporates this cut-in probability to enhance its reaction against the detected dangerous cut-in maneuver. The overall system is implemented and its performance is evaluated using realistic driving scenarios from Safety Pilot Model Deployment (SPMD).
Disentangling factors of variation within data has become a very challenging problem for image generation tasks. Current frameworks for training a Generative Adversarial Network (GAN), learn to disentangle the representations of the data in an unsupervised fashion and capture the most significant factors of the data variations. However, these approaches ignore the principle of content and style disentanglement in image generation, which means their learned latent code may alter the content and style of the generated images at the same time. This paper describes the Style and Content Disentangled GAN (SC-GAN), a new unsupervised algorithm for training GANs that learns disentangled style and content representations of the data. We assume that the representation of an image can be decomposed into a content code that represents the geometrical information of the data, and a style code that captures textural properties. Consequently, by fixing the style portion of the latent representation, we can generate diverse images in a particular style. Reversely, we can set the content code and generate a specific scene in a variety of styles. The proposed SC-GAN has two components: a content code which is the input to the generator, and a style code which modifies the scene style through modification of the Adaptive Instance Normalization (AdaIN) layers' parameters. We evaluate the proposed SC-GAN framework on a set of baseline datasets.
In this paper, we present a deep coupled learning framework to address the problem of matching polarimetric thermal face photos against a gallery of visible faces. Polarization state information of thermal faces provides the missing textural and geometrics details in the thermal face imagery which exist in visible spectrum. we propose a coupled deep neural network architecture which leverages relatively large visible and thermal datasets to overcome the problem of overfitting and eventually we train it by a polarimetric thermal face dataset which is the first of its kind. The proposed architecture is able to make full use of the polarimetric thermal information to train a deep model compared to the conventional shallow thermal-to-visible face recognition methods. Proposed coupled deep neural network also finds global discriminative features in a nonlinear embedding space to relate the polarimetric thermal faces to their corresponding visible faces. The results show the superiority of our method compared to the state-of-the-art models in cross thermal-to-visible face recognition algorithms.
Face sketches are able to capture the spatial topology of a face while lacking some facial attributes such as race, skin, or hair color. Existing sketch-photo recognition approaches have mostly ignored the importance of facial attributes. In this paper, we propose a new loss function, called attribute-centered loss, to train a Deep Coupled Convolutional Neural Network (DCCNN) for facial attribute guided sketch to photo matching. Specifically, an attributecentered loss is proposed which learns several distinct centers, in a shared embedding space, for photos and sketches with different combinations of attributes. The DCCNN simultaneously is trained to map photos and pairs of testified attributes and corresponding forensic sketches around their associated centers, while preserving the spatial topology information. Importantly, the centers learn to keep a relative distance from each other, related to their number of contradictory attributes. Extensive experiments are performed on composite (E-PRIP) and semi-forensic (IIIT-D Semi-forensic) databases. The proposed method significantly outperforms the state-of-the-art.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.