Contrastive Self-Supervised Learning With Smoothed Representation for Remote Sensing

Jung, Heechul; Oh, Yoonju; Jeong, Seong-Ho; Lee, Chaehyeon; Jeon, Taegyun

doi:10.1109/lgrs.2021.3069799

Cited by 42 publications

(27 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most existing methods apply InfoNCE loss [18, 22, 24-28, 30, 33, 62] or triple loss [17,60] on the constructed positive and negative pairs. Positive samples can be obtained by different artificial augmentations (e.g., color and geometric transformations) of the same image [25,28], spatial augmentations (.i.e., geospatially overlapped images) [24,27,60], temporal augmentations (i.e., multi-temporal co-registered images) [16,17,22,26,62], and modality augmentations (e.g., optical image, SAR, and semantic mask) [29,63]. Negative pairs can be different samples in a minibatch or spatially distinct images [60,62].…”

Section: Semantic Dissimilaritymentioning

confidence: 99%

“…Contrastive SSL [20,21] could learn useful representations from massive unlabeled data by pulling together representations of semantically similar samples (i.e., positive pairs) and pushing away those of dissimilar samples (i.e., negative pairs). Very recently, contrastive methods have been introduced in the RS domain [16][17][18][22][23][24][25][26][27][28][29][30][31][32][33] and have shown promising performance for the downstream supervised CD task [16][17][18].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Semantic-aware Dense Representation Learning for Remote Sensing Image Change Detection

Chen,

Li,

Chen

et al. 2022

Preprint

View full text Add to dashboard Cite

Training deep learning-based change detection (CD) model heavily depends on labeled data. Contemporary transfer learning-based methods to alleviate the CD label insufficiency mainly upon ImageNet pre-training. A recent trend is using remote sensing (RS) data to obtain in-domain representations via supervised or self-supervised learning (SSL). Here, different from traditional supervised pre-training that learns the mapping from image to label, we leverage semantic supervision in a contrastive manner. There are typically multiple objects of interest (e.g., buildings) distributed in varying locations in RS images. We propose dense semantic-aware pre-training for RS image CD via sampling multiple class-balanced points. Instead of manipulating image-level representations that lack spatial information, we constrain pixel-level cross-view consistency and cross-semantic discrimination to learn spatially-sensitive features, thus benefiting downstream dense CD. Apart from learning illumination invariant features, we fulfill consistent foreground features insensitive to irrelevant background changes via a synthetic view using background swapping. We additionally achieve discriminative representations to distinguish foreground land-covers and other backgrounds. We collect large-scale image-mask pairs freely available in the RS community for pre-training. Extensive experiments on three CD datasets verify the effectiveness of our method. Ours significantly outperforms ImageNet, in-domain supervision, and several SSL methods. Empirical results indicate ours well alleviates data insufficiency in CD. Notably, we achieve competitive results using only 20% training data than random baseline using 100% data. Both quantitative and qualitative results demonstrate the generalization ability of our pre-trained model to downstream images even remaining domain gaps with the pre-training data. Our data and code will make public.

show abstract

Section: Semantic Dissimilaritymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Semantic-aware Dense Representation Learning for Remote Sensing Image Change Detection

Chen,

Li,

Chen

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…This task is illustrated in Figure 3 and can be formalized as x k + = aug(sample_neighbor(x q )), where aug is the same as above and sample_neighbor generates a geographically close patch. Inspired by [26], this strategy aims to help the network to better cluster together similar regions (land, water bodies, etc.). The maximum distance can be varied to control the average overlap of sampled patches.…”

Section: Pretext Task Settingsmentioning

confidence: 99%

Ship Detection in Sentinel 2 Multi-Spectral Images with Self-Supervised Learning

Ciocarlan

Stoian

2021

Remote Sensing

View full text Add to dashboard Cite

Automatic ship detection provides an essential function towards maritime domain awareness for security or economic monitoring purposes. This work presents an approach for training a deep learning ship detector in Sentinel-2 multi-spectral images with few labeled examples. We design a network architecture for detecting ships with a backbone that can be pre-trained separately. By using self supervised learning, an emerging unsupervised training procedure, we learn good features on Sentinel-2 images, without requiring labeling, to initialize our network’s backbone. The full network is then fine-tuned to learn to detect ships in challenging settings. We evaluated this approach versus pre-training on ImageNet and versus a classical image processing pipeline. We examined the impact of variations in the self-supervised learning step and we show that in the few-shot learning setting self-supervised pre-training achieves better results than ImageNet pre-training. When enough training data are available, our self-supervised approach is as good as ImageNet pre-training. We conclude that a better design of the self-supervised task and bigger non-annotated dataset sizes can lead to surpassing ImageNet pre-training performance without any annotation costs.

show abstract

“…Unlike RS, in the computer vision (CV) community, unsupervised and in particular self-supervised cross-modal representation learning methods (which only rely on the alignments between modalities) are widely studied [8][9][10][11][12][13]. As an example, in [9] a deep jointsemantics reconstructing hashing (DJSRH) method is introduced to learn binary codes that preserve the neighborhood structure in the original data.…”

Section: Introductionmentioning

confidence: 99%

Unsupervised Contrastive Hashing for Cross-Modal Retrieval in Remote Sensing

Mikriukov

Ravanbakhsh

Demir

2022

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

The development of cross-modal retrieval systems that can search and retrieve semantically relevant data across different modalities based on a query in any modality has attracted great attention in remote sensing (RS). In this paper, we focus our attention on crossmodal text-image retrieval, where queries from one modality (e.g., text) can be matched to archive entries from another (e.g., image). Most of the existing cross-modal text-image retrieval systems in RS require a high number of labeled training samples and also do not allow fast and memory-efficient retrieval. These issues limit the applicability of the existing cross-modal retrieval systems for large-scale applications in RS. To address this problem, in this paper we introduce a novel unsupervised cross-modal contrastive hashing (DUCH) method for text-image retrieval in RS. To this end, the proposed DUCH is made up of two main modules: 1) feature extraction module, which extracts deep representations of two modalities; 2) hashing module that learns to generate cross-modal binary hash codes from the extracted representations. We introduce a novel multiobjective loss function including: i) contrastive objectives that enable similarity preservation in intra-and inter-modal similarities; ii) an adversarial objective that is enforced across two modalities for cross-modal representation consistency; and iii) binarization objectives for generating hash codes. Experimental results show that the proposed DUCH outperforms state-of-the-art methods. Our code is publicly available at https://git.tu-berlin.de/rsim/ duch.

show abstract

Contrastive Self-Supervised Learning With Smoothed Representation for Remote Sensing

Cited by 42 publications

References 16 publications

Semantic-aware Dense Representation Learning for Remote Sensing Image Change Detection

Semantic-aware Dense Representation Learning for Remote Sensing Image Change Detection

Ship Detection in Sentinel 2 Multi-Spectral Images with Self-Supervised Learning

Unsupervised Contrastive Hashing for Cross-Modal Retrieval in Remote Sensing

Contact Info

Product

Resources

About