Jing Zhang scite author profile

Object tracking in RGB-thermal (RGB-T) videos is increasingly used in many fields due to the all-weather and all-day working capability of the dual-modality imaging system, as well as the rapid development of low-cost and miniaturized infrared camera technology. However, it is still very challenging to effectively fuse dual-modality information to build a robust RGB-T tracker. In this paper, an RGB-T object tracking algorithm based on a modal-aware attention network and competitive learning (MaCNet) is proposed, which includes a feature extraction network, modal-aware attention network, and classification network. The feature extraction network adopts the form of a two-stream network to extract features from each modality image. The modal-aware attention network integrates the original data, establishes an attention model that characterizes the importance of different feature layers, and then guides the feature fusion to enhance the information interaction between modalities. The classification network constructs a modality-egoistic loss function through three parallel binary classifiers acting on the RGB branch, the thermal infrared branch, and the fusion branch, respectively. Guided by the training strategy of competitive learning, the entire network is fine-tuned in the direction of the optimal fusion of the dual modalities. Extensive experiments on several publicly available RGB-T datasets show that our tracker has superior performance compared to other latest RGB-T and RGB tracking approaches.

show abstract

Multi-Scale Context Aggregation for Semantic Segmentation of Remote Sensing Images

Zhang

et al. 2020

View full text Add to dashboard Cite

The semantic segmentation of remote sensing images (RSIs) is important in a variety of applications. Conventional encoder-decoder-based convolutional neural networks (CNNs) use cascade pooling operations to aggregate the semantic information, which results in a loss of localization accuracy and in the preservation of spatial details. To overcome these limitations, we introduce the use of the high-resolution network (HRNet) to produce high-resolution features without the decoding stage. Moreover, we enhance the low-to-high features extracted from different branches separately to strengthen the embedding of scale-related contextual information. The low-resolution features contain more semantic information and have a small spatial size; thus, they are utilized to model the long-term spatial correlations. The high-resolution branches are enhanced by introducing an adaptive spatial pooling (ASP) module to aggregate more local contexts. By combining these context aggregation designs across different levels, the resulting architecture is capable of exploiting spatial context at both global and local levels. The experimental results obtained on two RSI datasets show that our approach significantly improves the accuracy with respect to the commonly used CNNs and achieves state-of-the-art performance.

show abstract

Separating the Structural Components of Maize for Field Phenotyping Using Terrestrial LiDAR Data and Deep Convolutional Neural Networks

Jin

Gao

et al. 2020

IEEE Trans. Geosci. Remote Sensing

View full text Add to dashboard Cite

Semantic Segmentation of Large-Size VHR Remote Sensing Images Using a Two-Stage Multiscale Training Architecture

Ding

Zhang

Bruzzone

2020

IEEE Trans. Geosci. Remote Sensing

100

View full text Add to dashboard Cite

Vehicle classification for large-scale traffic surveillance videos using Convolutional Neural Networks

Jiang

Zhu

et al. 2017

Machine Vision and Applications

View full text Add to dashboard Cite

Unsupervised domain adaptation: A multi-task learning-based method

Zhang

Ogunbona

2019

Knowledge-Based Systems

View full text Add to dashboard Cite

This paper presents a novel multi-task learningbased method for unsupervised domain adaptation. Specifically, the source and target domain classifiers are jointly learned by considering the geometry of target domain and the divergence between the source and target domains based on the concept of multi-task learning. Two novel algorithms are proposed upon the method using Regularized Least Squares and Support Vector Machines respectively. Experiments on both synthetic and real world cross domain recognition tasks have shown that the proposed methods outperform several state-of-the-art domain adaptation methods.

show abstract

Personalized Social Image Recommendation Method Based on User-Image-Tag Model

Zhang

Yang

Tian

et al. 2017

IEEE Trans. Multimedia

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jing Zhang

Small Object Detection in Unmanned Aerial Vehicle Images Using Feature Fusion and Scaling-Based Single Shot Detector With Spatial Context Analysis

Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning

Multi-Scale Context Aggregation for Semantic Segmentation of Remote Sensing Images

Separating the Structural Components of Maize for Field Phenotyping Using Terrestrial LiDAR Data and Deep Convolutional Neural Networks

Semantic Segmentation of Large-Size VHR Remote Sensing Images Using a Two-Stage Multiscale Training Architecture

Vehicle classification for large-scale traffic surveillance videos using Convolutional Neural Networks

Unsupervised domain adaptation: A multi-task learning-based method

Personalized Social Image Recommendation Method Based on User-Image-Tag Model

Contact Info

Product

Resources

About