2022
DOI: 10.1109/access.2022.3201086
|View full text |Cite
|
Sign up to set email alerts
|

Thangka Image Segmentation Method Based on Enhanced Receptive Field

Abstract: The portrait thangka image is a kind of religious scroll painting that expresses figures' identity and duties through portraits, sitting platforms, and backlighting. The segmentation of significant semantic objects in the image is one of the essential ways for scholars to study and understand the image's content. To better understand this content, we elaborately collected a dataset of portrait-like thangkas, which consists of 4086 images covering four object categories. We provide rich annotation for this data… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 21 publications
0
2
0
Order By: Relevance
“…Deep Convolutional Neural Networks (CNNs) have demonstrated outstanding capabilities in handling complex visual tasks, where adjusting parameters such as network depth and convolutional kernel size to modulate the network's receptive field has become a common strategy for improving prediction accuracy. This is particularly crucial in applications requiring dense predictions such as semantic image segmentation [5] [6], stereo vision analysis [7], and optical flow estimation [8], as these tasks rely on a comprehensive understanding of the extensive context surrounding each pixel to ensure no critical information is overlooked. In this study, we adopted the innovative LarK Block from UniRepLKNet [9], which extends the model's receptive field by leveraging large kernel blocks without the need to increase network layers, effectively enhancing the network's ability to capture details.…”
Section: Of 26mentioning
confidence: 99%
See 1 more Smart Citation
“…Deep Convolutional Neural Networks (CNNs) have demonstrated outstanding capabilities in handling complex visual tasks, where adjusting parameters such as network depth and convolutional kernel size to modulate the network's receptive field has become a common strategy for improving prediction accuracy. This is particularly crucial in applications requiring dense predictions such as semantic image segmentation [5] [6], stereo vision analysis [7], and optical flow estimation [8], as these tasks rely on a comprehensive understanding of the extensive context surrounding each pixel to ensure no critical information is overlooked. In this study, we adopted the innovative LarK Block from UniRepLKNet [9], which extends the model's receptive field by leveraging large kernel blocks without the need to increase network layers, effectively enhancing the network's ability to capture details.…”
Section: Of 26mentioning
confidence: 99%
“…The Dilated Reparam Block is proposed based on equivalent transformation, aiming to enhance feature extraction by combining a non-sparse large-kernel convolutional layer with multiple sparse small-kernel convolutional layers. The key hyperparameters of this method include the size of the large kernel K, the size of parallel convolutional layers k, and the sparsity rate r. Assuming there are four parallel layers with K=9, r=(1,2,3,4), and k= (5,3,3,3). To utilize a larger K, more layers can be enhanced by increasing the kernel size or expanding the sparsity rate.…”
Section: Lark Blockmentioning
confidence: 99%