Thangka Image Segmentation Method Based on Enhanced Receptive Field

Wang, Hao; Hu, Jingyun; Ren, Xue; Liu, Yue; Pan, Guangxiu

doi:10.1109/access.2022.3201086

Cited by 3 publications

(2 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Deep Convolutional Neural Networks (CNNs) have demonstrated outstanding capabilities in handling complex visual tasks, where adjusting parameters such as network depth and convolutional kernel size to modulate the network's receptive field has become a common strategy for improving prediction accuracy. This is particularly crucial in applications requiring dense predictions such as semantic image segmentation [5] [6], stereo vision analysis [7], and optical flow estimation [8], as these tasks rely on a comprehensive understanding of the extensive context surrounding each pixel to ensure no critical information is overlooked. In this study, we adopted the innovative LarK Block from UniRepLKNet [9], which extends the model's receptive field by leveraging large kernel blocks without the need to increase network layers, effectively enhancing the network's ability to capture details.…”

Section: Of 26mentioning

confidence: 99%

“…The Dilated Reparam Block is proposed based on equivalent transformation, aiming to enhance feature extraction by combining a non-sparse large-kernel convolutional layer with multiple sparse small-kernel convolutional layers. The key hyperparameters of this method include the size of the large kernel K, the size of parallel convolutional layers k, and the sparsity rate r. Assuming there are four parallel layers with K=9, r=(1,2,3,4), and k= (5,3,3,3). To utilize a larger K, more layers can be enhanced by increasing the kernel size or expanding the sparsity rate.…”

Section: Lark Blockmentioning

confidence: 99%

See 1 more Smart Citation

YOLOv8-Mu: An Improved YOLOv8 Underwater Detector Based on Large Kernel Block and Multi-branch Heavy Parameterization Module

Jiang,

Zhuang,

Chen

et al. 2024

Preprint

View full text Add to dashboard Cite

Underwater visual detection technology plays a pivotal role in fields such as marine exploration. With the increasing demand for underwater monitoring, the quest for efficient and reliable methods for underwater target recognition has become particularly significant. To address this requirement, this study developed an innovative underwater object detection architecture based on YOLOv8, named YOLOv8-MU, aimed at significantly enhancing detection accuracy.By integrating the LarK module proposed in UniRepLKNet to optimize the backbone network, YOLOv8-MU aims to achieve a larger receptive field without increasing the model’s depth. Further, this research introduces C2fSTR, an innovative method that combines Swin Transformer with the C2f module. Additionally, we have incorporated the SPPFCSPC_EMA module, which combines Cross-Stage Partial Fast Spatial Pyramid Pooling (SPPFCSPC) with attention mechanisms, significantly improving the detection accuracy and robustness of various biological targets. Moreover, by introducing a fusion block based on DAMO-YOLO into the neck of the model, we further enhanced the model’s capability in multi-scale feature extraction. Finally, the adoption of the MPDIoU loss function, designed around vertex distance, effectively tackles the challenges of localization accuracy and boundary clarity in underwater organism detection. Experimental results on the URPC2019 dataset demonstrate that the YOLOv8-MU model achieved an mAP@0.5 of 78.4%, marking improvements of 5.6%, 1.1%, and 4.0% over YOLOv5s, YOLOv7, and YOLOv8n respectively, indicating the leading performance (SOTA) of this method. On the other hand, further evaluation on the URPC2020 dataset confirmed the generalization capability of the YOLOv8-MU architecture, with its mAP@0.5 reaching 80.4%, surpassing various models including YOLOv5x and YOLOv8n, showcasing the wide applicability of our proposed improved model architecture.

show abstract

Section: Of 26mentioning

confidence: 99%

Section: Lark Blockmentioning

confidence: 99%

YOLOv8-Mu: An Improved YOLOv8 Underwater Detector Based on Large Kernel Block and Multi-branch Heavy Parameterization Module

Jiang,

Zhuang,

Chen

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

Ancient thangka Buddha face recognition based on the Dlib machine learning library and comparison with secular aesthetics

Yang

Fan

2023

Herit Sci

View full text Add to dashboard Cite

Presently, face recognition technology is rapidly advancing and has emerged as a crucial field of research. Thangka, being a significant repository of Buddhist imagery, encompass a vast amount of Buddha image data depicting diverse Buddhist themes from various historical epochs. Accurate recognition of facial features in these Buddha images is particularly significant in comprehending the historical evolution of thangka, especially the facial features correlation between Buddha society and secular society. Hence, in this study, 68 facial feature points was employed to obtain using the Dlib deep learning library, from which 16 facial geometric feature indices were derived. These indices served as the foundation for the establishment of a facial measurement standard and aesthetic evaluation index for thangka Buddha. A meticulous evaluation and identification of thangka facial details spanning nearly a millennium were conducted, and the transformation of thangka facial features was analyzed and deliberated from a secular aesthetic perspective. Upon conducting this study, it was discovered that: (1) The deep learning library exhibited effective performance in identifying facial characteristics in thangka Buddha images, and the facial evaluation index proved to be a reliable tool for evaluating both measurement standards and aesthetics. (2) The facial measurement standards depicted in thangka Buddha images have evolved and become increasingly standardized over time, maintaining a highly symmetrical aesthetic. (3) The aesthetic of thangka facial features draw upon the secular Tibetan face as their primary reference(Euclidean distance is 0.42), however, during the 17-19th centuries, Han Chinese facial features were gradually incorporated(Euclidean distance is 0.492), and the degree of fusion between Han Chinese and Tibetan facial aesthetics has become more profound.

show abstract

YOLOv8-MU: An Improved YOLOv8 Underwater Detector Based on a Large Kernel Block and a Multi-Branch Reparameterization Module

Jiang,

Zhuang,

Chen

et al. 2024

Sensors

View full text Add to dashboard Cite

Underwater visual detection technology is crucial for marine exploration and monitoring. Given the growing demand for accurate underwater target recognition, this study introduces an innovative architecture, YOLOv8-MU, which significantly enhances the detection accuracy. This model incorporates the large kernel block (LarK block) from UniRepLKNet to optimize the backbone network, achieving a broader receptive field without increasing the model’s depth. Additionally, the integration of C2fSTR, which combines the Swin transformer with the C2f module, and the SPPFCSPC_EMA module, which blends Cross-Stage Partial Fast Spatial Pyramid Pooling (SPPFCSPC) with attention mechanisms, notably improves the detection accuracy and robustness for various biological targets. A fusion block from DAMO-YOLO further enhances the multi-scale feature extraction capabilities in the model’s neck. Moreover, the adoption of the MPDIoU loss function, designed around the vertex distance, effectively addresses the challenges of localization accuracy and boundary clarity in underwater organism detection. The experimental results on the URPC2019 dataset indicate that YOLOv8-MU achieves an mAP@0.5 of 78.4%, showing an improvement of 4.0% over the original YOLOv8 model. Additionally, on the URPC2020 dataset, it achieves 80.9%, and, on the Aquarium dataset, it reaches 75.5%, surpassing other models, including YOLOv5 and YOLOv8n, thus confirming the wide applicability and generalization capabilities of our proposed improved model architecture. Furthermore, an evaluation on the improved URPC2019 dataset demonstrates leading performance (SOTA), with an mAP@0.5 of 88.1%, further verifying its superiority on this dataset. These results highlight the model’s broad applicability and generalization capabilities across various underwater datasets.

show abstract

Thangka Image Segmentation Method Based on Enhanced Receptive Field

Cited by 3 publications

References 21 publications

YOLOv8-Mu: An Improved YOLOv8 Underwater Detector Based on Large Kernel Block and Multi-branch Heavy Parameterization Module

YOLOv8-Mu: An Improved YOLOv8 Underwater Detector Based on Large Kernel Block and Multi-branch Heavy Parameterization Module

Ancient thangka Buddha face recognition based on the Dlib machine learning library and comparison with secular aesthetics

YOLOv8-MU: An Improved YOLOv8 Underwater Detector Based on a Large Kernel Block and a Multi-Branch Reparameterization Module

Contact Info

Product

Resources

About