2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.01221
|View full text |Cite
|
Sign up to set email alerts
|

PolarMask: Single Shot Instance Segmentation With Polar Representation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
313
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 513 publications
(316 citation statements)
references
References 24 publications
0
313
0
1
Order By: Relevance
“…The modern development of two-stage segmentation is the relatively fast model YOLACT++ [21]. Another approach to improving the quality of segmentation of found objects involves deformation of the found contour with a special neural network, for example, based on the polar representation of the contour in PolarMask [22], the concept of the circular convolution in Deep Snake [23], or deep polygon transformer in PolyTransorfm [24].…”
Section: B Real-time Object Segmentationmentioning
confidence: 99%
“…The modern development of two-stage segmentation is the relatively fast model YOLACT++ [21]. Another approach to improving the quality of segmentation of found objects involves deformation of the found contour with a special neural network, for example, based on the polar representation of the contour in PolarMask [22], the concept of the circular convolution in Deep Snake [23], or deep polygon transformer in PolyTransorfm [24].…”
Section: B Real-time Object Segmentationmentioning
confidence: 99%
“…With continual advancement in statistical modeling, speech recognition has been widely adopted in robots and smart devices (Reddy and Raj, 1976 ) to realize natural language-based human–computer interaction. Furthermore, substantial development in the field of image perception has been carried out, even achieving human-level performance in some tasks (Hou et al, 2020 ; Uzkent et al, 2020 ; Xie et al, 2020 ). By fusing visual and auditory information, robots are able to understand human natural language instructions and carry out required tasks.…”
Section: Introductionmentioning
confidence: 99%
“…Supervised deep learning enables accurate computer vision models. Key for this success is the access to raw sensor data (i.e., images) with ground truth (GT) for the visual task at hand (e.g., image classification [ 1 ], object detection [ 2 ] and recognition [ 3 ], pixel-wise instance/semantic segmentation [ 4 , 5 ], monocular depth estimation [ 6 ], 3D reconstruction [ 7 ], etc.). The supervised training of such computer vision models, which are based on convolutional neural networks (CNNs), is known to required very large amounts of images with GT [ 8 ].…”
Section: Introductionmentioning
confidence: 99%