Zhong Zhang scite author profile

et al. 2018

Remote Sensing

The accurate ground-based cloud classification is a challenging task and still under development. The most current methods are limited to only taking the cloud visual features into consideration, which is not robust to the environmental factors. In this paper, we present the novel joint fusion convolutional neural network (JFCNN) to integrate the multimodal information for ground-based cloud classification. To learn the heterogeneous features (visual features and multimodal features) from the ground-based cloud data, we designed the proposed JFCNN as a two-stream structure which contains the vision subnetwork and multimodal subnetwork. We also proposed a novel layer named joint fusion layer to jointly learn two kinds of cloud features under one framework. After training the proposed JFCNN, we extracted the visual and multimodal features from the two subnetworks and integrated them using a weighted strategy. The proposed JFCNN was validated on the multimodal ground-based cloud (MGC) dataset and achieved remarkable performance, demonstrating its effectiveness for ground-based cloud classification task.

Multi-View Ground-Based Cloud Recognition by Transferring Deep Visual Information

et al. 2018

Applied Sciences

Since cloud images captured from different views possess extreme variations, multi-view ground-based cloud recognition is a very challenging task. In this paper, a study of view shift is presented in this field. We focus both on designing proper feature representation and learning distance metrics from sample pairs. Correspondingly, we propose transfer deep local binary patterns (TDLBP) and weighted metric learning (WML). On one hand, to deal with view shift, like variations of illuminations, locations, resolutions and occlusions, we first utilize cloud images to train a convolutional neural network (CNN), and then extract local features from the part summing maps (PSMs) based on feature maps. Finally, we maximize the occurrences of regions for the final feature representation. On the other hand, the number of cloud images in each category varies greatly, leading to the unbalanced similar pairs. Hence, we propose a weighted strategy for metric learning. We validate the proposed method on three cloud datasets (the MOC_e, IAP_e, and CAMS_e) that are collected by different meteorological organizations in China, and the experimental results show the effectiveness of the proposed method.

Deep Activation Pooling for Blind Image Quality Assessment

et al. 2018

Driven by the rapid development of digital imaging and network technologies, the opinion-unaware blind image quality assessment (BIQA) method has become an important yet very challenging task. In this paper, we design an effective novel scheme for opinion-unaware BIQA. We first utilize the convolutional maps to select high-contrast patches, and then we utilize these selected patches of pristine images to train a pristine multivariate Gaussian (PMVG) model. In the test stage, each high-contrast patch is fitted by a test MVG (TMVG) model, and the local quality score is obtained by comparing with the PMVG. Finally, we propose the deep activation pooling (DAP) to automatically emphasize the more important scores and suppress the less important ones so as to obtain the overall image quality score. We verify the proposed method on two widely used databases, that is, the computational and subjective image quality (CSIQ) and the laboratory for image and video engineering (LIVE) databases, and the experimental results demonstrate that the proposed method achieves better results than the state-of-the-art methods.

Cross-Domain Ground-Based Cloud Classification Based on Transfer of Local Features and Discriminative Metric Learning

et al. 2017

Remote Sensing

Abstract:Cross-domain ground-based cloud classification is a challenging issue as the appearance of cloud images from different cloud databases possesses extreme variations. Two fundamental problems which are essential for cross-domain ground-based cloud classification are feature representation and similarity measurement. In this paper, we propose an effective feature representation called transfer of local features (TLF), and measurement method called discriminative metric learning (DML). The TLF is a generalized representation framework that can integrate various kinds of local features, e.g., local binary patterns (LBP), local ternary patterns (LTP) and completed LBP (CLBP). In order to handle domain shift, such as variations of illumination, image resolution, capturing location, occlusion and so on, the TLF mines the maximum response in regions to make a stable representation for domain variations. We also propose to learn a discriminant metric, simultaneously. We make use of sample pairs and the relationship among cloud classes to learn the distance metric. Furthermore, in order to improve the practicability of the proposed method, we replace the original cloud images with the convolutional activation maps which are then applied to TLF and DML. The proposed method has been validated on three cloud databases which are collected in China alone, provided by Chinese Academy of Meteorological Sciences (CAMS), Meteorological Observation Centre (MOC), and Institute of Atmospheric Physics (IAP). The classification accuracies outperform the state-of-the-art methods.

Completed Part Transformer for Person Re-Identification

et al. 2024

IEEE Trans. Multimedia

Recently, part information of pedestrian images has been demonstrated to be effective for person re-identification (ReID), but the part interaction is ignored when using Transformer to learn long-range dependencies. In this paper, we propose a novel transformer network named Completed Part Transformer (CPT) for person ReID, where we design the part transformer layer to learn the completed part interaction. The part transformer layer includes the intra-part layer and the part-global layer, where they consider long-range dependencies from the aspects of the intra-part interaction and the partglobal interaction, simultaneously. Furthermore, in order to overcome the limitation of fixed number of the patch tokens in the transformer layer, we propose the Adaptive Refined Tokens (ART) module to focus on learning the interaction between the informative patch tokens in the pedestrian image, which improves the discrimination of the pedestrian representation. Extensive experimental results on four person ReID datasets, i.e., MSMT17, Market1501, DukeMTMC-reID and CUHK03, demonstrate that the proposed method achieves a new stateof-the-art performance, e.g., it achieves 68.0% mAP and 84.6% Rank-1 accuracy on MSMT17.