Zoom-In-To-Check: Boosting Video Interpolation via Instance-Level Discrimination

Yuan, Liangzhe; Chen, Yibo; Liu, Hantian; Kong, Tao; Shi, Jianbo

doi:10.1109/cvpr.2019.01246

Cited by 31 publications

(30 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sun et al (Sun et al 2019) select and weight synthetic pixels which are similar with real ones for learning semantic segmentation. Yuan et al (Yuan et al 2019) introduce an instance-level adversarial loss for video frame interpolation problem. Shen et al (Shen et al 2019) propose an instanceaware image-to-image translation framework.…”

Section: Related Workmentioning

confidence: 99%

Task-Aware Monocular Depth Estimation for 3D Object Detection

Wang

Yin

Kong³

et al. 2020

AAAI

Self Cite

View full text Add to dashboard Cite

Monocular depth estimation enables 3D perception from a single 2D image, thus attracting much research attention for years. Almost all methods treat foreground and background regions (“things and stuff”) in an image equally. However, not all pixels are equal. Depth of foreground objects plays a crucial role in 3D object recognition and localization. To date how to boost the depth prediction accuracy of foreground objects is rarely discussed. In this paper, we first analyze the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground and background depth using separate optimization objectives and decoders. Our method significantly improves the depth estimation performance on foreground objects. Applying ForeSeE to 3D object detection, we achieve 7.5 AP gains and set new state-of-the-art results among other monocular methods. Code will be available at: https://github.com/WXinlong/ForeSeE.

show abstract

Section: Related Workmentioning

confidence: 99%

Task-Aware Monocular Depth Estimation for 3D Object Detection

Wang

Yin

Kong³

et al. 2020

AAAI

Self Cite

View full text Add to dashboard Cite

show abstract

“…DAIN (Bao et al 2019) used PWC-Net (Sun et al 2018) and a depth network (Chen et al 2016) to explicitly detect the occlusion. Other interpolation methods such as CtxSyn (Niklaus and Liu 2018) and Zoom-In-to-Check (Yuan et al 2019) warped not only input frames, but also their deep corresponding features. Despite the so far mentioned approaches have been shown effective, the performance can be limited provided that the optical flow or occlusion masks were less accurate.…”

Section: Flow Based Methodsmentioning

confidence: 99%

“…Compared with Sepconv (Niklaus, Mai, and Liu 2017b), our method considers more relevant pixels far away from local grid (black rectangle) with a much smaller kernel size and performs better. estimating flow information together with occlusion masks or visibility maps with deep convolutional neural networks (CNNs) (Jiang et al 2018;Bao et al 2019;2018a;Liu et al 2017;van Amersfoort et al 2017;Liu et al 2019;Xue et al 2019;Peleg et al 2019;Yuan et al 2019;Hannemose et al 2019).…”

Section: Introductionmentioning

confidence: 99%

Video Frame Interpolation via Deformable Separable Convolution

Cheng

Chen

2020

AAAI

View full text Add to dashboard Cite

Learning to synthesize non-existing frames from the original consecutive video frames is a challenging task. Recent kernel-based interpolation methods predict pixels with a single convolution process to replace the dependency of optical flow. However, when scene motion is larger than the pre-defined kernel size, these methods yield poor results even though they take thousands of neighboring pixels into account. To solve this problem in this paper, we propose to use deformable separable convolution (DSepConv) to adaptively estimate kernels, offsets and masks to allow the network to obtain information with much fewer but more relevant pixels. In addition, we show that the kernel-based methods and conventional flow-based methods are specific instances of the proposed DSepConv. Experimental results demonstrate that our method significantly outperforms the other kernel-based interpolation methods and shows strong performance on par or even better than the state-of-the-art algorithms both qualitatively and quantitatively.

show abstract

“…Toward more sophisticated motion modeling, several novel approaches [19,65,3,45], as well as higher-order representations, are proposed. Quadratic [62,23] and cubic [8] flows are estimated from multiple input frames.…”

Section: Related Workmentioning

confidence: 99%

NTIRE 2021 Challenge on Video Super-Resolution

Son¹,

Lee²,

Nah³

et al. 2021

Preprint

View full text Add to dashboard Cite

Super-Resolution (SR) is a fundamental computer vision task that aims to obtain a high-resolution clean image from the given low-resolution counterpart. This paper reviews the NTIRE 2021 Challenge on Video Super-Resolution. We present evaluation results from two competition tracks as well as the proposed solutions. Track 1 aims to develop conventional video SR methods focusing on the restoration quality. Track 2 assumes a more challenging environment with lower frame rates, casting spatio-temporal SR problem. In each competition, 247 and 223 participants have registered, respectively. During the final testing phase, 14 teams competed in each track to achieve state-of-the-art performance on video SR tasks.

show abstract

Zoom-In-To-Check: Boosting Video Interpolation via Instance-Level Discrimination

Cited by 31 publications

References 30 publications

Task-Aware Monocular Depth Estimation for 3D Object Detection

Task-Aware Monocular Depth Estimation for 3D Object Detection

Video Frame Interpolation via Deformable Separable Convolution

NTIRE 2021 Challenge on Video Super-Resolution

Contact Info

Product

Resources

About