Attention-guided chained context aggregation for semantic segmentation

Tang, Quan; Liu, Fagui; Zhang, Tong; Jiang, Jun

doi:10.1016/j.imavis.2021.104309

Cited by 29 publications

(6 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then, we compare the proposed method with state-of-thearts methods (IEMNet [60], EMSANet [67], RAFNet [64], ESANet [24], RedNet [8], ACNet [23], SGNet [27], CANet [68], RDFNet [7], ShapeConv [66]) on the SUN RGB-D dataset. As depicted in Table 2, our approach consistently achieves a higher mIoU score on the SUN RGB-D dataset Visual comparisons on the NYU-Depth V2 dataset.…”

Section: Quantitative Experimental Results On Nyu-depth V2 and Sun Rg...mentioning

confidence: 99%

MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention

Zhang,

Xie

2024

Front. Phys.

View full text Add to dashboard Cite

The semantic segmentation of RGB-D images involves understanding objects appearances and spatial relationships within a scene, which necessitates careful consideration of multiple factors. In indoor scenes, the presence of diverse and disorderly objects, coupled with illumination variations and the influence of adjacent objects, can easily result in misclassifications of pixels, consequently affecting the outcome of semantic segmentation. We propose a Multi-modal Interaction and Pooling Attention Network (MIPANet) in response to these challenges. This network is designed to exploit the interactive synergy between RGB and depth modalities, aiming to enhance the utilization of complementary information and improve segmentation accuracy. Specifically, we incorporate a Multi-modal Interaction Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Moreover, we introduce a Pooling Attention Module (PAM) at various stages of the encoder to enhance the features extracted by the network. The outputs of the PAMs at different stages are selectively integrated into the decoder through a refinement module to improve semantic segmentation performance. Experimental results demonstrate that MIPANet outperforms existing methods on two indoor scene datasets, NYU-Depth V2 and SUN-RGBD, by optimizing the insufficient information interaction between different modalities in RGB-D semantic segmentation. The source codes are available at https://github.com/2295104718/MIPANet.

show abstract

Section: Quantitative Experimental Results On Nyu-depth V2 and Sun Rg...mentioning

confidence: 99%

MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention

Zhang,

Xie

2024

Front. Phys.

View full text Add to dashboard Cite

show abstract

“…Although the upsampling operation based on bilinear interpolation [15] and nearest neighbor interpolation [16] can capture and restore the features extracted by the convolutional layer to a certain extent, its process does not consider the difference between each predicted pixel. Correlation, such as weak data-dependent convolutional decoders [17], cannot produce relatively high-quality feature maps. In this paper, the DUpsampling structure based on data correlation is added to the features extracted by the 3D-UNet network reconstruction encoding path so that the obtained feature map has better expressive ability.…”

Section: Dupsampling Structurementioning

confidence: 99%

Research on Lung Tumor Cell Segmentation Method Based on Improved UNet Algorithm

Sun

Chen

Zhang

et al. 2022

Scientific Programming

View full text Add to dashboard Cite

In order to improve the completeness of the computer-aided diagnosis system for the segmentation of large-sized lung tumors and the segmentation accuracy of small-sized lung tumors, a dual-attention 3D-UNet lung tumor segmentation network model was constructed. The upsampling operation in the traditional 3D-UNet network is replaced with the DUpsampling structure. By minimizing the loss between the pixels of the feature map and the compressed label image, a more expressive feature map is obtained, thereby improving the network convergence speed. On this basis, the spatial attention module and the channel attention module are integrated so that similar features in single channel and multichannel are related to each other, and the global correlation of feature maps is increased to improve the accuracy of segmentation results. The experimental results show that compared with methods such as 3D-UNet, the model effectively improves the accuracy of lung tumor cell segmentation, and the MIoU score on the public dataset LIDC-IDRI reaches 89.4%. The segmentation method will be closer to the facts and in line with the safety and health of human life.

show abstract

“…The need for semantic segmentation in the context of Cityscapes emerges [3] from the need to glean useful insights from large-scale urban photographs and films. This necessitates the use of such technology.…”

Section: Introductionmentioning

confidence: 99%

“…([d19, f14]) f21 = Conv2D(c20, 512,(3,3), "same", 1, "relu") b21 = BatchNormalization(f21) f22 = Conv2D(b21, 512,(3,3), "same", 1, "relu") # Second Upsample m23 = UpSample(f22, size = (2, 2)) d23 = Dropout(m23, 0.2)PLOS ONE c24 = Concatenate([d23, f10]) f25 = Conv2D(c24, 256,(3,3), "same", 1, "relu") b25 = BatchNormalization(f25) f26 = Conv2D(b25, 256,(3,3), "same", 1, "relu") # Third Upsample m27 = UpSample(f26, size = (2, 2)) d27 = Dropout(m27, 0.2) c28 = Concatenate([d27, f6]) f29 = Conv2D(c28, 128,(3,3), "same", 1, "relu") b29 = BatchNormalization(f29) f30 = Conv2D(b29, 128, (3, 3), "same", 1, "relu") # Fourth Upsample m31 = UpSample(f30, size = (2, 2)) d31 = Dropout(m31, 0.2) c32 = Concatenate([d31, f2]) f33 = Conv2D(c32, 64, (3, 3), "same", 1, "relu") b33 = BatchNormalization(f33) f34 = Conv2D(b33, 64,(3,3), "same", 1, "relu")…”

mentioning

confidence: 99%

Semantic segmentation of urban environments: Leveraging U-Net deep learning model for cityscape image analysis

Arulananth,

Kuppusamy,

Ayyasamy

et al. 2024

PLoS ONE

View full text Add to dashboard Cite

Semantic segmentation of cityscapes via deep learning is an essential and game-changing research topic that offers a more nuanced comprehension of urban landscapes. Deep learning techniques tackle urban complexity and diversity, which unlocks a broad range of applications. These include urban planning, transportation management, autonomous driving, and smart city efforts. Through rich context and insights, semantic segmentation helps decision-makers and stakeholders make educated decisions for sustainable and effective urban development. This study investigates an in-depth exploration of cityscape image segmentation using the U-Net deep learning model. The proposed U-Net architecture comprises an encoder and decoder structure. The encoder uses convolutional layers and down sampling to extract hierarchical information from input images. Each down sample step reduces spatial dimensions, and increases feature depth, aiding context acquisition. Batch normalization and dropout layers stabilize models and prevent overfitting during encoding. The decoder reconstructs higher-resolution feature maps using "UpSampling2D" layers. Through extensive experimentation and evaluation of the Cityscapes dataset, this study demonstrates the effectiveness of the U-Net model in achieving state-of-the-art results in image segmentation. The results clearly shown that, the proposed model has high accuracy, mean IOU and mean DICE compared to existing models.

show abstract

Attention-guided chained context aggregation for semantic segmentation

Cited by 29 publications

References 12 publications

MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention

MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention

Research on Lung Tumor Cell Segmentation Method Based on Improved UNet Algorithm

Semantic segmentation of urban environments: Leveraging U-Net deep learning model for cityscape image analysis

Contact Info

Product

Resources

About