Variational joint self‐attention for image captioning

Shao, Xiangjun; Xiang, Zhenglong; Li, Yuanxiang; Zhang, Mingjie

doi:10.1049/ipr2.12470

Cited by 3 publications

(2 citation statements)

References 29 publications

(72 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The attention module is mainly used in tasks where context information is important, such as visual question answering (VQA), image captioning, and scene character recognition [33,34]. However, when the concept of attention was expanded to self-attention, it began to be used in CNN.…”

Section: Attention Modulementioning

confidence: 99%

E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation

Kim,

Park,

Kim

et al. 2023

Electronics

View full text Add to dashboard Cite

In the field of computer vision, convolutional neural network (CNN)-based models have demonstrated high accuracy and good generalization performance. However, in semantic segmentation, CNN-based models have a problem—the spatial and global context information is lost owing to a decrease in resolution during feature extraction. High-resolution networks (HRNets) can resolve this problem by keeping high-resolution processing layers parallel. However, information loss still occurs. Therefore, in this study, we propose an HRNet combined with an attention module to address the issue of information loss. The attention module is strategically placed immediately after each convolution to alleviate information loss by emphasizing the information retained at each stage. To achieve this, we employed a squeeze-and-excitation (SE) block as the attention module, which can seamlessly integrate into any model and enhance the performance without imposing significant parameter increases. It emphasizes the spatial and global context information by compressing and recalibrating features through global average pooling (GAP). A performance comparison between the existing HRNet model and the proposed model using various datasets show that the mean class-wise intersection over union (mIoU) and mean pixel accuracy (MeanACC) improved with the proposed model, however, there was a small increase in the number of parameters. With cityscapes dataset, MeanACC decreased by 0.1% with the proposed model compared to the baseline model, but mIoU increased by 0.5%. With the LIP dataset, the MeanACC and mIoU increased by 0.3% and 0.4%, respectively. The mIoU also decreased by 0.1% with the PASCAL Context dataset, whereas the MeanACC increased by 0.7%. Overall, the proposed model showed improved performance compared to the existing model.

show abstract

Section: Attention Modulementioning

confidence: 99%

E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation

Kim,

Park,

Kim

et al. 2023

Electronics

View full text Add to dashboard Cite

show abstract

“…These models require giant data sets, long training time, and high hardware requirements, which can only be satisfied in laboratories. Therefore, many researchers use selfattention to improve the fully convolutional models, slightly increasing the computational complexity of the model to obtain better detection results [27][28][29][30]. In 2022, Guo proposed visual attention network (VAN) [18], which adds Self-attention to the convolutional layer to form the VAN module, significantly improving the performance of the fully convolutional network.…”

Section: Introductionmentioning

confidence: 99%

Higher accuracy detection strategy for electroluminescent defects in photovoltaic modules based on improved Yolov5

Ding,

Chen,

Chen

et al. 2023

IET Renewable Power Gen

View full text Add to dashboard Cite

At present, the domestic photovoltaic (PV) industry is developing rapidly. In order to improve the production efficiency of PV cells, a fast and accurate automatic detection model of PV modules’ defects that can be applied in the production line is essential. In this paper, based on the characteristics of significant differences in PV module defect size and a large number of fine defects, an improved defect detection algorithm based on Yolov5 is proposed. Thirteen mainstream defects are divided into two categories according to size, and a series‐connected detection network is constructed for a two‐stage detection. In order to better detect fine defects, this paper proposes the TR‐ResNet module, a residual module composed of the self‐attention, based on the self‐attention mechanism, to replace some fully convolutional residual modules (CNN‐ResNet module) in the Yolov5 backbone network. After testing, the Precision, Recall and mAP of the model are greatly improved and gained 0.904, 0.845, and 0.840. Moreover, the model performs well in stability detection, which can adapt to different production environments and quality requirements. The present study may make the detection work more efficient and improve productivity.

show abstract