Semantic Understanding of Scenes through the ADE20K Dataset

Zhou, Bolei; Zhao, Hang; Puig, Xavier; Xiao, Tete; Fidler, Sanja; Barriuso, Adela; Torralba, Antonio

doi:10.48550/arxiv.1608.05442

Cited by 84 publications

(118 citation statements)

References 28 publications

Supporting

Mentioning

112

Contrasting

Order By: Relevance

“…Fine-tuning: We employed three data sets: ImageNet (ILSVRC 2012) [49] for single-label classification, MS-COCO [34] for object detection, and ADE20K [60] for semantic segmentation. The number of images used was 1% of each data set (roughly, 12 thousand for ImageNet, a thousand for COCO, and 2 hundred for ADE20K).…”

Section: Pre-trainingmentioning

confidence: 99%

A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your Pre-training Effective?

Mikami¹,

Fukumizu²,

Murai³

et al. 2021

Preprint

View full text Add to dashboard Cite

Synthetic-to-real transfer learning is a framework in which we pre-train models with synthetically generated images and ground-truth annotations for real tasks. Although synthetic images overcome the data scarcity issue, it remains unclear how the fine-tune performance scales with pre-trained models, especially in terms of pre-training data size. In this study, we collect a number of empirical observations and uncover the secret. Through experiments, we observe a simple and general scaling law that consistently describes learning curves in various tasks, models, and complexities of synthesized pre-training data. Further, we develop a theory of transfer learning for a simplified scenario and confirm that the derived generalization bound is consistent with our empirical findings.

show abstract

Section: Pre-trainingmentioning

confidence: 99%

A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your Pre-training Effective?

Mikami¹,

Fukumizu²,

Murai³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The training and testing sets consist of about 4998 and 5105 images respectively. ADE-20k: ADE-20k [16] is a challenging dataset that contains 22K densely annotated images with 150 fine-grained semantic concepts. The training and validation sets consist of 20210 and 2000 images respectively.…”

Section: A Benchmarksmentioning

confidence: 99%

“…That means each point shares the different global context since they have the appearance variation locally. Visualization of Error Map: Figure 10 gives error map on both Cityscapes [73] and ADE20k [16] validation datasets using ASSP as GA head baselines. In particular, we use ResNet101 backbone as a strong baseline and LDv2 as the LD module.…”

Section: Visualization Of Local Affinity Map On Sampled Pointsmentioning

confidence: 99%

“…(ii) We conduct comprehensive ablation studies to verify the proposed method including quantitative improvements over baselines and visualization analysis. (iii) We conduct extensive experiments on four more challenging semantic segmentation datasets including Camvid [14], Pascal Context [15], ADE-20k [16] and COCO-stuff [17] where our method establishes new state of the art.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Global Aggregation then Local Distribution for Scene Parsing

Li,

Zhang,

Cheng

et al. 2021

Preprint

View full text Add to dashboard Cite

Modelling long-range contextual relationships is critical for pixel-wise prediction tasks such as semantic segmentation. However, convolutional neural networks (CNNs) are inherently limited to model such dependencies due to the naive structure in its building modules (e.g., local convolution kernel). While recent global aggregation methods are beneficial for longrange structure information modelling, they would oversmooth and bring noise to the regions contain fine details (e.g., boundaries and small objects), which are very much cared in the semantic segmentation task. To alleviate this problem, we propose to explore the local context for making the aggregated long-range relationship being distributed more accurately in local regions. In particular, we design a novel local distribution module which models the affinity map between global and local relationship for each pixel adaptively. Integrating existing global aggregation modules, we show that our approach can be modularized as an end-to-end trainable block and easily plugged into existing semantic segmentation networks, giving rise to the GALD networks. Despite its simplicity and versatility, our approach allows us to build new state of the art on major semantic segmentation benchmarks including Cityscapes, ADE20K, Pascal Context, Camvid and COCO-stuff. Code and trained models are released at https://github.com/lxtGH/GALD-DGCNet to foster further research.

show abstract

“…One important direction of visual pattern recognition in agriculture is aerial image semantic segmentation. Different from conventional image semantic segmentation dataset where only RGB based image is available [5,20,19,13], the agricultural data collection process utilizes specific cameras to capture Red, Green and Blue channel(RGB) with an additional near-infrared(NIR) signal channel which can be used in the pattern recognition process [4]. Also, agricultural data is naturally imbalanced.…”

Section: Introductionmentioning

confidence: 99%

Reducing the feature divergence of RGB and near-infrared images using Switchable Normalization

Yang

Zhao

et al. 2020

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

View full text Add to dashboard Cite

Visual pattern recognition over agricultural areas is an important application of aerial image processing. In this paper, we consider the multi-modality nature of agricultural aerial images and show that naively combining different modalities together without taking the feature divergence into account can lead to sub-optimal results. Thus, we apply a Switchable Normalization block to our DeepLabV3+ segmentation model to alleviate the feature divergence. Using the popular symmetric Kullback-Leibler divergence measure, we show that our model can greatly reduce the divergence between RGB and near-infrared channels. Together with a hybrid loss function, our model achieves nearly 10% improvements in mean IoU over previously published baseline.

show abstract

Semantic Understanding of Scenes through the ADE20K Dataset

Cited by 84 publications

References 28 publications

A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your Pre-training Effective?

A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your Pre-training Effective?

Global Aggregation then Local Distribution for Scene Parsing

Reducing the feature divergence of RGB and near-infrared images using Switchable Normalization

Contact Info

Product

Resources

About