Figure 1: An example of our dataset, where (a) is the input color image, (b) illustrates the labeled 2D keypoints, (c) shows the 3D model fitting result with labeled 2D keypoints.
In 2D/3D object detection task, Intersection-over-Union (IoU) has been widely employed as an evaluation metric to evaluate the performance of different detectors in the testing stage. However, during the training stage, the common distance loss (e.g., L 1 or L 2 ) is often adopted as the loss function to minimize the discrepency between the predicted and ground truth Bounding Box (Bbox). To eliminate the performance gap between training and testing, the IoU loss has been introduced for 2D object detection in [1] and [2]. Unfortunately, all these approaches only work for axis-aligned 2D Bboxes, which cannot be applied for more general object detection task with rotated Bboxes. To resolve this issue, we investigate the IoU computation for two rotated Bboxes first and then implement a unified framework, IoU loss layer for both 2D and 3D object detection tasks. By integrating the implemented IoU loss into several state-of-the-art 3D object detectors, consistent improvements have been achieved for both bird-eye-view 2D detection and point cloud 3D detection on the public KITTI [3] benchmark.
Depth image super-resolution is an extremely challenging task due to the information loss in sub-sampling. Deep convolutional neural network have been widely applied to color image super-resolution. Quite surprisingly, this success has not been matched to depth super-resolution. This is mainly due to the inherent difference between color and depth images. In this paper, we bridge up the gap and extend the success of deep convolutional neural network to depth super-resolution. The proposed deep depth super-resolution method learns the mapping from a lowresolution depth image to a high resolution one in an end-toend style. Furthermore, to better regularize the learned depth map, we propose to exploit the depth field statistics and the local correlation between depth image and color image. These priors are integrated in an energy minimization formulation, where the deep neural network learns the unary term, the depth field statistics works as global model constraint and the colordepth correlation is utilized to enforce the local structure in depth images. Extensive experiments on various depth super-resolution benchmark datasets show that our method outperforms the stateof-the-art depth image super-resolution methods with a margin.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.