Zhong Zhang scite author profile

et al. 2018

Remote Sensing

The accurate ground-based cloud classification is a challenging task and still under development. The most current methods are limited to only taking the cloud visual features into consideration, which is not robust to the environmental factors. In this paper, we present the novel joint fusion convolutional neural network (JFCNN) to integrate the multimodal information for ground-based cloud classification. To learn the heterogeneous features (visual features and multimodal features) from the ground-based cloud data, we designed the proposed JFCNN as a two-stream structure which contains the vision subnetwork and multimodal subnetwork. We also proposed a novel layer named joint fusion layer to jointly learn two kinds of cloud features under one framework. After training the proposed JFCNN, we extracted the visual and multimodal features from the two subnetworks and integrated them using a weighted strategy. The proposed JFCNN was validated on the multimodal ground-based cloud (MGC) dataset and achieved remarkable performance, demonstrating its effectiveness for ground-based cloud classification task.

Multi-View Ground-Based Cloud Recognition by Transferring Deep Visual Information

et al. 2018

Applied Sciences

Since cloud images captured from different views possess extreme variations, multi-view ground-based cloud recognition is a very challenging task. In this paper, a study of view shift is presented in this field. We focus both on designing proper feature representation and learning distance metrics from sample pairs. Correspondingly, we propose transfer deep local binary patterns (TDLBP) and weighted metric learning (WML). On one hand, to deal with view shift, like variations of illuminations, locations, resolutions and occlusions, we first utilize cloud images to train a convolutional neural network (CNN), and then extract local features from the part summing maps (PSMs) based on feature maps. Finally, we maximize the occurrences of regions for the final feature representation. On the other hand, the number of cloud images in each category varies greatly, leading to the unbalanced similar pairs. Hence, we propose a weighted strategy for metric learning. We validate the proposed method on three cloud datasets (the MOC_e, IAP_e, and CAMS_e) that are collected by different meteorological organizations in China, and the experimental results show the effectiveness of the proposed method.

Deep Activation Pooling for Blind Image Quality Assessment

et al. 2018

Driven by the rapid development of digital imaging and network technologies, the opinion-unaware blind image quality assessment (BIQA) method has become an important yet very challenging task. In this paper, we design an effective novel scheme for opinion-unaware BIQA. We first utilize the convolutional maps to select high-contrast patches, and then we utilize these selected patches of pristine images to train a pristine multivariate Gaussian (PMVG) model. In the test stage, each high-contrast patch is fitted by a test MVG (TMVG) model, and the local quality score is obtained by comparing with the PMVG. Finally, we propose the deep activation pooling (DAP) to automatically emphasize the more important scores and suppress the less important ones so as to obtain the overall image quality score. We verify the proposed method on two widely used databases, that is, the computational and subjective image quality (CSIQ) and the laboratory for image and video engineering (LIVE) databases, and the experimental results demonstrate that the proposed method achieves better results than the state-of-the-art methods.

Cross-Domain Ground-Based Cloud Classification Based on Transfer of Local Features and Discriminative Metric Learning

et al. 2017

Remote Sensing

Abstract:Cross-domain ground-based cloud classification is a challenging issue as the appearance of cloud images from different cloud databases possesses extreme variations. Two fundamental problems which are essential for cross-domain ground-based cloud classification are feature representation and similarity measurement. In this paper, we propose an effective feature representation called transfer of local features (TLF), and measurement method called discriminative metric learning (DML). The TLF is a generalized representation framework that can integrate various kinds of local features, e.g., local binary patterns (LBP), local ternary patterns (LTP) and completed LBP (CLBP). In order to handle domain shift, such as variations of illumination, image resolution, capturing location, occlusion and so on, the TLF mines the maximum response in regions to make a stable representation for domain variations. We also propose to learn a discriminant metric, simultaneously. We make use of sample pairs and the relationship among cloud classes to learn the distance metric. Furthermore, in order to improve the practicability of the proposed method, we replace the original cloud images with the convolutional activation maps which are then applied to TLF and DML. The proposed method has been validated on three cloud databases which are collected in China alone, provided by Chinese Academy of Meteorological Sciences (CAMS), Meteorological Observation Centre (MOC), and Institute of Atmospheric Physics (IAP). The classification accuracies outperform the state-of-the-art methods.

Cross-Modality Person Retrieval with Cross-Modality Loss Functions

Dong

Zhou

et al. 2023

Completed Part Transformer for Person Re-Identification

et al. 2024

IEEE Trans. Multimedia

Integration Transformer for Ground-Based Cloud Image Segmentation

IEEE Trans. Geosci. Remote Sensing

et al. 2023

Recently, convolutional neural network (CNN) dominates the ground-based cloud image segmentation task, but disregards the learning of long-range dependencies due to the limited size of filters. Although Transformer-based methods could overcome this limitation, they only learn long-range dependencies at a single scale, hence failing to capture multi-scale information of cloud image. The multi-scale information is beneficial to ground-based cloud image segmentation, because the features from small scales tend to extract detailed information while features from large scales have the ability to learn global information. In this paper, we propose a novel deep network named Integration Transformer (InTransformer), which builds long-range dependencies from different scales. To this end, we propose the Hybrid Multi-head Transformer Block (HMTB) to learn multi-scale long-range dependencies, and hybridize CNN and HMTB as the encoder at different scales. The proposed InTransformer hybridizes CNN and Transformer as the encoder to extract multi-scale representations, which learns both local information and long-range dependencies with different scales. Meanwhile, in order to fuse the patch tokens with different scales, we propose Mutual Cross-Attention Module (MCAM) for the decoder of InTransformer which could adequately interact multiscale patch tokens in a bidirectional way. We have conducted a series of experiments on large ground-based cloud detection database TLCDD and SWIMSEG. The experimental results show that the performance of our method outperforms other methods, proving the effectiveness of the proposed InTransformer.

TransCloudSeg: Ground-Based Cloud Image Segmentation With Transformer

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

et al. 2022

Cloud image segmentation plays an important role in ground-based cloud observation. Recently, most existing methods for ground-based cloud image segmentation learn feature representations using the convolutional neural network (CNN), which results in the loss of global information because of the limited receptive field size of the filters in the CNN. In this article, we propose a novel deep model named TransCloudSeg, which makes full use of the advantages of the CNN and transformer to extract detailed information and global contextual information for ground-based cloud image segmentation. Specifically, TransCloudSeg hybridizes the CNN and transformer as the encoders to obtain different features. To recover and fuse the feature maps from the encoders, we design the CNN decoder and the transformer decoder for TransCloudSeg. After obtaining two sets of feature maps from two different decoders, we propose the heterogeneous fusion module to effectively fuse the heterogeneous feature maps by applying the self-attention mechanism. We conduct a series of experiments on Tianjin Normal University large-scale cloud detection database and Tianjin Normal University cloud detection database, and the results show that our method achieves a better performance than other state-of-the-art methods, thus proving the effectiveness of the proposed TransCloudSeg.