Masked GAN for Unsupervised Depth and Pose Prediction With Scale Consistency

Zhao, Chaoqiang; Yen, Gary G.; Sun, Qiyu; Zhang, Chongzhen; Tang, Yang

doi:10.1109/tnnls.2020.3044181

Cited by 43 publications

(17 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The existing methods which are trained with monocular video sequences simultaneously predict the scene depths and estimate the camera poses [1, 3,5,13,17,18,21,24,31,34,[40][41][42]44]. Zhou et al [44] proposed an end-to-end approach comprised of two separate networks for predicting depths and camera poses.…”

Section: Self-supervised Monocular Trainingmentioning

confidence: 99%

“…We firstly evaluate the OCFD-Net with/without a postprocessing step (PP.) [12] on the raw KITTI Eigen test set [7] in comparison to 20 state-of-the-art methods, including 10 methods trained with monocular video sequences (M) [1, 17,18,21,24,31,34,[42][43][44] and 10 methods trained with stereo image pairs (S) [12,13,15,[26][27][28]32,36,37,45]. As done in [15,17], we also evaluate the OCFD-Net on the improved KITTI Eigen test set [33].…”

Section: Comparative Evaluationmentioning

confidence: 99%

See 1 more Smart Citation

Learning Occlusion-Aware Coarse-to-Fine Depth Map for Self-supervised Monocular Depth Estimation

Zhou¹,

Dong²

2022

Preprint

View full text Add to dashboard Cite

Self-supervised monocular depth estimation, aiming to learn scene depths from single images in a self-supervised manner, has received much attention recently. In spite of recent efforts in this field, how to learn accurate scene depths and alleviate the negative influence of occlusions for selfsupervised depth estimation, still remains an open problem. Addressing this problem, we firstly empirically analyze the effects of both the continuous and discrete depth constraints which are widely used in the training process of many existing works. Then inspired by the above empirical analysis, we propose a novel network to learn an Occlusion-aware Coarse-to-Fine Depth map for self-supervised monocular depth estimation, called OCFD-Net. Given an arbitrary training set of stereo image pairs, the proposed OCFD-Net does not only employ a discrete depth constraint for learning a coarse-level depth map, but also employ a continuous depth constraint for learning a scene depth residual, resulting in a fine-level depth map. In addition, an occlusionaware module is designed under the proposed OCFD-Net, which is able to improve the capability of the learnt finelevel depth map for handling occlusions. Extensive experimental results on the public KITTI and Make3D datasets demonstrate that the proposed method outperforms 20 existing state-of-the-art methods in most cases.

show abstract

Section: Self-supervised Monocular Trainingmentioning

confidence: 99%

Section: Comparative Evaluationmentioning

confidence: 99%

Learning Occlusion-Aware Coarse-to-Fine Depth Map for Self-supervised Monocular Depth Estimation

Zhou¹,

Dong²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Image and texture synthesis are challenging tasks [28], [29]. With the breakthrough of GANs [16], [30], [31], [32], [33], [34], [35], directly generating a handwritten text image has become an interesting topic. Non-recurrent generative methods [1], [9], [13], [14] can produce a handwritten text image according to a given text string.…”

Section: A Handwritten Text Image Synthesismentioning

confidence: 99%

SLOGAN: Handwriting Style Synthesis for Arbitrary-Length and Out-of-Vocabulary Text

Luo¹,

Zhu²,

Jin³

et al. 2022

Preprint

View full text Add to dashboard Cite

Large amounts of labeled data are urgently required for the training of robust text recognizers. However, collecting handwriting data of diverse styles, along with an immense lexicon, is considerably expensive. Although data synthesis is a promising way to relieve data hunger, two key issues of handwriting synthesis, namely, style representation and content embedding, remain unsolved. To this end, we propose a novel method that can synthesize parameterized and controllable handwriting Styles for arbitrary-Length and Out-of-vocabulary text based on a Generative Adversarial Network (GAN), termed SLOGAN. Specifically, we propose a style bank to parameterize the specific handwriting styles as latent vectors, which are input to a generator as style priors to achieve the corresponding handwritten styles. The training of the style bank requires only the writer identification of the source images, rather than attribute annotations. Moreover, we embed the text content by providing an easily obtainable printed style image, so that the diversity of the content can be flexibly achieved by changing the input printed image. Finally, the generator is guided by dual discriminators to handle both the handwriting characteristics that appear as separated characters and in a series of cursive joins. Our method can synthesize words that are not included in the training vocabulary and with various new styles. Extensive experiments have shown that high-quality text images with great style diversity and rich vocabulary can be synthesized using our method, thereby enhancing the robustness of the recognizer.

show abstract

“…Occlusions and moving objects affect the pixel correspondence between images, thus impacting the photometric loss during training and resulting in the limited performance of the depth network. A number of methods [12], [27]- [29] design a mask or mask network to estimate the regions that violate the projection, so as to reduce the effect of these regions on the training process. Since the mask network is jointly trained with pose and depth networks in an unsupervised manner, this method cannot completely address the influence of occlusions and moving objects.…”

Section: Related Workmentioning

confidence: 99%

Unsupervised Monocular Depth Estimation in Highly Complex Environments

Zhao¹,

Yang²,

Sun³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Previous unsupervised monocular depth estimation methods mainly focus on the day-time scenario, and their frameworks are driven by warped photometric consistency. While in some challenging environments, like night, rainy night or snowy winter, the photometry of the same pixel on different frames is inconsistent because of the complex lighting and reflection, so that the day-time unsupervised frameworks cannot be directly applied to these complex scenarios. In this paper, we investigate the problem of unsupervised monocular depth estimation in certain highly complex scenarios. We address this challenging problem by using domain adaptation, and a unified image transfer-based adaptation framework is proposed based on monocular videos in this paper. The depth model trained on day-time scenarios is adapted to different complex scenarios. Instead of adapting the whole depth network, we just consider the encoder network for lower computational complexity. The depth models adapted by the proposed framework to different scenarios share the same decoder, which is practical. Constraints on both feature space and output space promote the framework to learn the key features for depth decoding, and the smoothness loss is introduced into the adaptation framework for better depth estimation performance. Extensive experiments show the effectiveness of the proposed unsupervised framework in estimating the dense depth map from the night-time, rainy night-time and snowy winter images.

show abstract

Masked GAN for Unsupervised Depth and Pose Prediction With Scale Consistency

Cited by 43 publications

References 35 publications

Learning Occlusion-Aware Coarse-to-Fine Depth Map for Self-supervised Monocular Depth Estimation

Learning Occlusion-Aware Coarse-to-Fine Depth Map for Self-supervised Monocular Depth Estimation

SLOGAN: Handwriting Style Synthesis for Arbitrary-Length and Out-of-Vocabulary Text

Unsupervised Monocular Depth Estimation in Highly Complex Environments

Contact Info

Product

Resources

About