Monocular depth estimation, which plays a crucial role in understanding 3D scene geometry, is an ill-posed problem. Recent methods have gained significant improvement by exploring image-level information and hierarchical features from deep convolutional neural networks (DCNNs). These methods model depth estimation as a regression problem and train the regression networks by minimizing mean squared error, which suffers from slow convergence and unsatisfactory local solutions. Besides, existing depth estimation networks employ repeated spatial pooling operations, resulting in undesirable low-resolution feature maps. To obtain high-resolution depth maps, skip-connections or multilayer deconvolution networks are required, which complicates network training and consumes much more computations. To eliminate or at least largely reduce these problems, we introduce a spacing-increasing discretization (SID) strategy to discretize depth and recast depth network learning as an ordinal regression problem. By training the network using an ordinary regression loss, our method achieves much higher accuracy and faster convergence in synch. Furthermore, we adopt a multi-scale network structure which avoids unnecessary spatial pooling and captures multi-scale information in parallel. The proposed deep ordinal regression network (DORN) achieves state-of-the-art results on three challenging benchmarks, i.e., KITTI [16], Make3D [49], and NYU Depth v2 [41], and outperforms existing methods by a large margin.
Supervised depth estimation has achieved high accuracy due to the advanced deep network architectures. Since the groundtruth depth labels are hard to obtain, recent methods try to learn depth estimation networks in an unsupervised way by exploring unsupervised cues, which are effective but less reliable than true labels. An emerging way to resolve this dilemma is to transfer knowledge from synthetic images with ground truth depth via domain adaptation techniques. However, these approaches overlook specific geometric structure of the natural images in the target domain (i.e., real data), which is important for highperforming depth prediction. Motivated by the observation, we propose a geometry-aware symmetric domain adaptation framework (GASDA) to explore the labels in the synthetic data and epipolar geometry in the real data jointly. Moreover, by training two image style translators and depth estimators symmetrically in an end-to-end network, our model achieves better image style transfer and generates high-quality depth maps. The experimental results demonstrate the effectiveness of our proposed method and comparable performance against the state-of-the-art. Code will be publicly available at: https://github.com/ sshan-zhao/GASDA.
Unsupervised domain mapping aims to learn a function to translate domain X to Y by a function G XY in the absence of paired examples. Finding the optimal G XY without paired data is an ill-posed problem, so appropriate constraints are required to obtain reasonable solutions. One of the most prominent constraints is cycle consistency, which enforces the translated image by G XY to be translated back to the input image by an inverse mapping G Y X . While cycle consistency requires the simultaneous training of G XY and G Y X , recent studies have shown that one-sided domain mapping can be achieved by preserving pairwise distances between images. Although cycle consistency and distance preservation successfully constrain the solution space, they overlook the special properties of images that simple geometric transformations do not change the image's semantic structure. Based on this special property, we develop a geometry-consistent generative adversarial network (GcGAN), which enables one-sided unsupervised domain mapping. GcGAN takes the original image and its counterpart image transformed by a predefined geometric transformation as inputs and generates two images in the new domain coupled with the corresponding geometry-consistency constraint. The geometryconsistency constraint reduces the space of possible solutions while keep the correct solutions in the search space. Quantitative and qualitative comparisons with the baseline (GAN alone) and the state-of-the-art methods including CycleGAN [62] and DistanceGAN [5] demonstrate the effectiveness of our method. * equal contribution arXiv:1809.05852v2 [cs.CV] 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970
The human visual system excels at detecting local blur of visual images, but the underlying mechanism is not well understood. Traditional views of blur such as reduction in energy at high frequencies and loss of phase coherence at localized features have fundamental limitations. For example, they cannot well discriminate flat regions from blurred ones. Here we propose that high-level semantic information is critical in successfully identifying local blur. Therefore, we resort to deep neural networks that are proficient at learning high-level features and propose the first end-to-end local blur mapping algorithm based on a fully convolutional network. By analyzing various architectures with different depths and design philosophies, we empirically show that high-level features of deeper layers play a more important role than low-level features of shallower layers in resolving challenging ambiguities for this task. We test the proposed method on a standard blur detection benchmark and demonstrate that it significantly advances the state-of-the-art (ODS F-score of 0.853). Furthermore, we explore the use of the generated blur maps in three applications, including blur region segmentation, blur degree estimation, and blur magnification.
3D-FRONT is a new, large-scale, and comprehensive repository of synthetic indoor scenes with professionally and distinctively designed layouts, a large number (18,797) of rooms populated with 3D furniture objects that are stylistically compatible and endowed with high-quality textures. All freely available to the academic community and beyond.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.