RingMo: A Remote Sensing Foundation Model With Masked Image Modeling

Sun, Xian; Wang, Peijin; Lu, Wanxuan; Zhu, Zicong; Lü, Xiaonan; He, Qibin; Li, Junxi; Rong, Xuee; Yang, Zhujun; Chang, Hao; He, Qian; Yang, Guang; Wang, Ruiping; Lu, Jiwen; Fu, Kun

doi:10.1109/tgrs.2022.3194732

Cited by 78 publications

(57 citation statements)

References 113 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this work, we propose a methodology to significantly improve generalizability of DL natural hazards mappers based on pre-training on a suitable pre-task. Our approach supports the development of foundation models for earth monitoring, such as [9], with the objective of directly segmenting unseen natural hazards across unseen geographic regions. Our contributions are as follows: First, we demonstrate across four U-Net architectures that our approach significantly improves the generalizability of DL models for the segmentation of unseen natural hazards.…”

Section: Motivationmentioning

confidence: 99%

Toward Foundation Models for Earth Monitoring: Generalizable Deep Learning Models for Natural Hazard Segmentation

Jakubik¹,

Muszyński²,

Vössing³

et al. 2023

Preprint

View full text Add to dashboard Cite

Section: Motivationmentioning

confidence: 99%

Toward Foundation Models for Earth Monitoring: Generalizable Deep Learning Models for Natural Hazard Segmentation

Jakubik¹,

Muszyński²,

Vössing³

et al. 2023

Preprint

View full text Add to dashboard Cite

“…SatMAE [23] leveraged temporal and multi-spectral information in RS images to improve self-supervised pre-training with MIM. RingMo [14] applied the MAE [29] method and designed a new mask strategy for self-supervised representation learning on a 3 million unlabeled RS images dataset. The fine-tuning results on various downstream tasks showed that the new mask strategy was more appropriate for RS images and the learned representations by RingMo were generalized well to various RS downstream tasks.…”

Section: Self-supervised Learning In Remote Sensingmentioning

confidence: 99%

“…2) Adaptation for RS Images: Although the simplicity and effectiveness of SimMIM, there are some limitations must be taken into account when applying SimMIM into RS images. One issue is that SimMIM replaces masked patches with the [MASK] token, but RS images are known for their multiobject characteristics [36] and the objects are usually densely distributed [14]. The masking operation may cause the dense and small objects in the image to be lost [14], leading to incomplete semantic meaning and making image reconstruction more difficult [14].…”

Section: B Masked Image Modeling Branchmentioning

confidence: 99%

“…One issue is that SimMIM replaces masked patches with the [MASK] token, but RS images are known for their multiobject characteristics [36] and the objects are usually densely distributed [14]. The masking operation may cause the dense and small objects in the image to be lost [14], leading to incomplete semantic meaning and making image reconstruction more difficult [14]. Moreover, the zero-initialized mask token [MASK] is not originally presented in the image.…”

Section: B Masked Image Modeling Branchmentioning

confidence: 99%

“…SSL methods learn from vast amounts of unlabeled data by leveraging the structure present in the data itself to create supervised signals [10], [11], [12]. This possibility to train deep learning models without human-annotated labels, combined with the impressive results achieved by SSL methods on natural images [11], [13], has prompted numerous investigations into adopting SSL for RS images [14], [15], [16]. Contrastive learning (CL) and masked image modeling (MIM) are currently the most widely-adopted SSL methods in the domain of RS [10].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

CMID: A Unified Self-Supervised Learning Framework for Remote Sensing Image Understanding

Muhtar

Zhang

Xiao

et al. 2023

IEEE Trans. Geosci. Remote Sensing

View full text Add to dashboard Cite

Self-supervised learning (SSL) has gained widespread attention in the remote sensing (RS) and earth observation (EO) communities owing to its ability to learn task-agnostic representations without human-annotated labels. Nevertheless, most existing RS SSL methods are limited to learning either global semantic separable or local spatial perceptible representations. We argue that this learning strategy is suboptimal in the realm of RS, since the required representations for different RS downstream tasks are often varied and complex. In this study, we proposed a unified SSL framework that is better suited for RS images representation learning. The proposed SSL framework, Contrastive Mask Image Distillation (CMID), is capable of learning representations with both global semantic separability and local spatial perceptibility by combining contrastive learning (CL) with masked image modeling (MIM) in a self-distillation way. Furthermore, our CMID learning framework is architecture-agnostic, which is compatible with both convolutional neural networks (CNN) and vision transformers (ViT), allowing CMID to be easily adapted to a variety of deep learning (DL) applications for RS understanding. Comprehensive experiments have been carried out on four downstream tasks (i.e. scene classification, semantic segmentation, object-detection, and change detection) and the results show that models pre-trained using CMID achieve better performance than other state-of-the-art SSL methods on multiple downstream tasks. The code and pre-trained models will be made available at https://github.com/NJU-LHRS/official-CMID to facilitate SSL research and speed up the development of RS images DL applications.

show abstract

Impact of user-generated travel posts on travel decisions: A comparative study on Weibo and Xiaohongshu

Wang

Huang²,

Liu-Lastres³

2022

Annals of Tourism Research Empirical Insights

View full text Add to dashboard Cite

RingMo: A Remote Sensing Foundation Model With Masked Image Modeling

Cited by 78 publications

References 113 publications

Toward Foundation Models for Earth Monitoring: Generalizable Deep Learning Models for Natural Hazard Segmentation

Toward Foundation Models for Earth Monitoring: Generalizable Deep Learning Models for Natural Hazard Segmentation

CMID: A Unified Self-Supervised Learning Framework for Remote Sensing Image Understanding

Impact of user-generated travel posts on travel decisions: A comparative study on Weibo and Xiaohongshu

Contact Info

Product

Resources

About