2020
DOI: 10.1049/el.2019.4150
|View full text |Cite
|
Sign up to set email alerts
|

Multi‐modal deep network for RGB‐D segmentation of clothes

Abstract: In this Letter, the authors propose a deep learning based method to perform semantic segmentation of clothes from RGB-D images of people. First, they present a synthetic dataset containing more than 50,000 RGB-D samples of characters in different clothing styles, featuring various poses and environments for a total of nine semantic classes. The proposed data generation pipeline allows for fast production of RGB, depth images and ground-truth label maps. Secondly, a novel multi-modal encoder-ecoder convolutiona… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 16 publications
(22 reference statements)
0
7
0
Order By: Relevance
“…For example, Linemod [8] locates and estimates object poses by extracting gradient features of images and normal features of depth images. Some other methods use deep learning to extract the RGBD feature, Shao et al [35] proposes two fusion strategies, the first is concatenates RGB and depth image into a raw input to the CNN network, and another strategy just like [3,36,37], they utilize CNN network to extract the RGB image and depth image features, and then concatenate the features as the input for object segmentation and pose estimation. However, these methods neglects the inner structure of the depth channel and extract depth image features as a supplement channel to the RGB feature channels.…”
Section: Pose From Rgbd Datamentioning
confidence: 99%
“…For example, Linemod [8] locates and estimates object poses by extracting gradient features of images and normal features of depth images. Some other methods use deep learning to extract the RGBD feature, Shao et al [35] proposes two fusion strategies, the first is concatenates RGB and depth image into a raw input to the CNN network, and another strategy just like [3,36,37], they utilize CNN network to extract the RGB image and depth image features, and then concatenate the features as the input for object segmentation and pose estimation. However, these methods neglects the inner structure of the depth channel and extract depth image features as a supplement channel to the RGB feature channels.…”
Section: Pose From Rgbd Datamentioning
confidence: 99%
“…© 2021 The Authors. The Journal of Engineering published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology problems such as large parameters and complex calculations due to the influence of network depth [12,13]. This leads to the problem of insufficient real-time performance for image segmentation.…”
Section: Introductionmentioning
confidence: 99%
“…The emergence of deep learning (DL) has opened up a new situation for the problem of image target segmentation and recognition, and has excellent performance for the recognition of a variety of small‐scale data sets [11]. At the same time, it is still necessary to see that traditional DL network models have problems such as large parameters and complex calculations due to the influence of network depth [12, 13]. This leads to the problem of insufficient real‐time performance for image segmentation.…”
Section: Introductionmentioning
confidence: 99%
“…Introduction: 3D registration is a classical and fundamental problem for countless applications. Since commodity depth cameras become less expensive and more accurate, depth images play an increasingly important role in numerous tasks [1]. In order to obtain comprehensive information from 3D scenery, point clouds captured from multiple views need to be aligned.…”
mentioning
confidence: 99%
“…The well-established method is iterative closest point (ICP) [2] based on which a myriad of flavours have been proposed. In ICP, given a source shape and a target shape, the following steps are performed: (1) for each point in the source shape, identify the closest corresponding point in the target shape; (2) predict the transformation by minimizing the mean square Euclidean distance between these correspondences; (3) transform the source shape using the predicted transformation from step 2; (4) iterate the above steps until the mean square distance reaches a pre-defined threshold. ICP and its variants are the dominating methods for the task of 3D registration.…”
mentioning
confidence: 99%