Object-Contextual Representations for Semantic Segmentation

Yuan, Yuhui; Chen, Xilin; Wang, Jingdong

doi:10.1007/978-3-030-58539-6_11

Cited by 794 publications

(567 citation statements)

References 63 publications

Supporting

Mentioning

550

Contrasting

Order By: Relevance

“…DenseASPP [34] combines a dense skip connection with ASPP, which effectively enlarges the receptive field size of the network. Recently, inspired by the success of the attention mechanism in natural language processing, the self-attention mechanism has also been applied to aggregate the dense pixel-wise context [18,[35][36][37]. The major drawback of self-attention is that it has excessive computation and memory consumption.…”

Section: Aggregation Of the Multi-scale Contextmentioning

confidence: 99%

“…Evidently, this way increases the complexity and the parameter amount of the model. The others focus on exploiting the online hard example mining strategy [29,37] and perceptual loss [41], which both require careful re-training or fine-tuning of the hyperparameters.…”

Section: Boundary Refinementmentioning

confidence: 99%

“…Test settings: Following the works [29,30,36,37], the multi-scale (MS) inference strategy was employed. When using multi-scale inference, the final results were generated by averaging all predictions with scales {0.75, 1.0, 1.25, 1.50, 1.75}.…”

Section: Experimental Settingsmentioning

confidence: 99%

See 2 more Smart Citations

Boundary-Aware Refined Network for Automatic Building Extraction in Very High-Resolution Urban Aerial Images

Jin

Zhang

et al. 2021

Remote Sensing

View full text Add to dashboard Cite

Convolutional Neural Networks (CNNs), such as U-Net, have shown competitive performance in the automatic extraction of buildings from Very High-Resolution (VHR) aerial images. However, due to the unstable multi-scale context aggregation, the insufficient combination of multi-level features and the lack of consideration of the semantic boundary, most existing CNNs produce incomplete segmentation for large-scale buildings and result in predictions with huge uncertainty at building boundaries. This paper presents a novel network with a special boundary-aware loss embedded, called the Boundary-Aware Refined Network (BARNet), to address the gap above. The unique properties of the proposed BARNet are the gated-attention refined fusion unit, the denser atrous spatial pyramid pooling module, and the boundary-aware loss. The performance of the BARNet is tested on two popular data sets that include various urban scenes and diverse patterns of buildings. Experimental results demonstrate that the proposed method outperforms several state-of-the-art approaches in both visual interpretation and quantitative evaluations.

show abstract

Section: Aggregation Of the Multi-scale Contextmentioning

confidence: 99%

Section: Boundary Refinementmentioning

confidence: 99%

See 1 more Smart Citation

Boundary-Aware Refined Network for Automatic Building Extraction in Very High-Resolution Urban Aerial Images

Jin

Zhang

et al. 2021

Remote Sensing

View full text Add to dashboard Cite

show abstract

“…The space attention module can aggregate different features at different positions, and the channel attention module can integrate the correlation features between different channels to yield more precise results. OCNet [ 34 ] employs the strategy of aggregating contextual information using ground truth values to supervise the learning of target areas, uses the corresponding object context representation to characterize the pixels, and then calculates the relationship between each pixel and each target area via object-contextual representation to extend the representation of each pixel. CCNet [ 35 ] proposed a new crisscross attention model to obtain the context information of nearby pixels.…”

Section: Related Workmentioning

confidence: 99%

Attention-Based Context Aware Network for Semantic Comprehension of Aerial Scenery

Shi

Qin

Yun

et al. 2021

Sensors

View full text Add to dashboard Cite

It is essential for researchers to have a proper interpretation of remote sensing images (RSIs) and precise semantic labeling of their component parts. Although FCN(Fully Convolutional Networks)-like deep convolutional network architectures have been widely applied in the perception of autonomous cars, there are still two challenges in the semantic segmentation of RSIs. The first is to identify details in high-resolution images with complex scenes and to solve the class-mismatch issues; the second is to capture the edge of objects finely without being confused by the surroundings. HRNET has the characteristics of maintaining high-resolution representation by fusing feature information with parallel multi-resolution convolution branches. We adopt HRNET as a backbone and propose to incorporate the Class-Oriented Region Attention Module (CRAM) and Class-Oriented Context Fusion Module (CCFM) to analyze the relationships between classes and patch regions and between classes and local or global pixels, respectively. Thus, the perception capability of the model for the detailed part in the aerial image can be enhanced. We leverage these modules to develop an end-to-end semantic segmentation model for aerial images and validate it on the ISPRS Potsdam and Vaihingen datasets. The experimental results show that our model improves the baseline accuracy and outperforms some commonly used CNN architectures.

show abstract

“…This holds true for semantic segmentation (i.e., the pixel-wise labeling of input images). CNN-based algorithms are the top performing solutions for the PASCAL VOC 2012 [ 12 , 13 , 14 ] dataset, cityscapes [ 15 , 16 , 17 ], and ADE20K [ 10 , 18 , 19 ]. There have also been multiple proposals for using CNNs to analyze endoscopic camera images, predominantly in the medical field [ 20 , 21 , 22 ].…”

Section: Introductionmentioning

confidence: 99%

A U-Net Based Approach for Automating Tribological Experiments

Staar

Bayrak

Paulkowski

et al. 2020

Sensors

View full text Add to dashboard Cite

Tribological experiments (i.e., characterizing the friction and wear behavior of materials) are crucial for determining their potential areas of application. Automating such tests could hence help speed up the development of novel materials and coatings. Here, we utilize convolutional neural networks (CNNs) to automate a common experimental setup whereby an endoscopic camera was used to measure the contact area between a rubber sample and a spherical counterpart. Instead of manually determining the contact area, our approach utilizes a U-Net-like CNN architecture to automate this task, creating a much more efficient and versatile experimental setup. Using a 5× random permutation cross validation as well as additional sanity checks, we show that we approached human-level performance. To ensure a flexible and mobile setup, we implemented the method on an NVIDIA Jetson AGX Xavier development kit where we achieved ~18 frames per second by employing mixed-precision training.

show abstract

Object-Contextual Representations for Semantic Segmentation

Cited by 794 publications

References 63 publications

Boundary-Aware Refined Network for Automatic Building Extraction in Very High-Resolution Urban Aerial Images

Boundary-Aware Refined Network for Automatic Building Extraction in Very High-Resolution Urban Aerial Images

Attention-Based Context Aware Network for Semantic Comprehension of Aerial Scenery

A U-Net Based Approach for Automating Tribological Experiments

Contact Info

Product

Resources

About