An Internal Learning Approach to Video Inpainting

Zhang, Haotian; Mai, Long; Jin, Hailin; Wang, Zhaowen; Xu, Ning; Collomosse, John

doi:10.1109/iccv.2019.00281

Cited by 75 publications

(60 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The sequence-to-sequence cloud removal method [28] follows the 3D Encoder-Decoder architecture of [29], constituted of an encoder as well as a decoder component. Both compo-nents are arranged symmetrically in the style of U-Net [30] and linked via skip connections between paired layers.…”

Section: B Internal Learning For Sequence-to-sequence Cloud Removalmentioning

confidence: 99%

“…Moreover, the point estimator receives tuples of S1 and S2 inputs, whereas the network of Fig. 8 is driven solely by S1 data (or Gaussian noise, as proposed in [33], [29]). Finally, the sequence-topoint network of Fig.…”

Section: B Internal Learning For Sequence-to-sequence Cloud Removalmentioning

confidence: 99%

“…7 builds on the Siamese architecture of [12] with a ResNet backbone [13] plus 3D convolutions, whereas the sequence-to-sequence approach of Fig. 8 follows a 3D convolutional variant of U-Net [30], as proposed in [29].…”

Section: B Internal Learning For Sequence-to-sequence Cloud Removalmentioning

confidence: 99%

See 2 more Smart Citations

SEN12MS-CR-TS: A Remote-Sensing Data Set for Multimodal Multitemporal Cloud Removal

Ebel

Schmitt

et al. 2022

IEEE Trans. Geosci. Remote Sensing

View full text Add to dashboard Cite

About half of all optical observations collected via spaceborne satellites are affected by haze or clouds. Consequently, cloud coverage affects the remote sensing practitioner's capabilities of a continuous and seamless monitoring of our planet. This work addresses the challenge of optical satellite image reconstruction and cloud removal by proposing a novel multi-modal and multi-temporal data set called SEN12MS-CR-TS. We propose two models highlighting the benefits and use cases of SEN12MS-CR-TS: First, a multi-modal multi-temporal 3D-Convolution Neural Network that predicts a cloud-free image from a sequence of cloudy optical and radar images. Second, a sequence-to-sequence translation model that predicts a cloud-free time series from a cloud-covered time series. Both approaches are evaluated experimentally, with their respective models trained and tested on SEN12MS-CR-TS. The conducted experiments highlight the contribution of our data set to the remote sensing community as well as the benefits of multi-modal and multitemporal information to reconstruct noisy information. Our data set is available at https://patrickTUM.github.io/cloud removal.

show abstract

Section: B Internal Learning For Sequence-to-sequence Cloud Removalmentioning

confidence: 99%

Section: B Internal Learning For Sequence-to-sequence Cloud Removalmentioning

confidence: 99%

See 1 more Smart Citation

SEN12MS-CR-TS: A Remote-Sensing Data Set for Multimodal Multitemporal Cloud Removal

Ebel

Schmitt

et al. 2022

IEEE Trans. Geosci. Remote Sensing

View full text Add to dashboard Cite

show abstract

“…The aim of video-to-video synthesis (vid2vid) [1], [3] is to convert an input semantic video to an output convincing video. Generally speaking, video restoration [18]- [23], including super-resolution [24]- [31], deblurring [32]- [37], dehazing [38]- [44], blending [45], [46] and future video prediction [47]- [53] can be considered as different research directions of the video-to-video synthesis issues. A routine approach is to represent source video as consecutive frames in order, and then generate target video from the modelprocessed images according to the time sequence.…”

Section: B Video-to-video Synthesismentioning

confidence: 99%

Supervised Video-to-Video Synthesis for Single Human Pose Transfer

Wang

Huang

et al. 2021

IEEE Access

View full text Add to dashboard Cite

In this paper, we focus on human pose transfer in different videos, i.e., transferring the dance pose of a person in given video to a target person in the other video. Our methods can be summed up in three stages to tackle this challenging scenario. Firstly, we extract the frames and pose masks from the source video and target video. Secondly, we use our model to synthesize the frames of target person with the given dance pose. Thirdly, we refine the generated frames to improve the quality of outputs. Our model is built on three stages: 1) human pose extraction and normalization. 2) a GAN based on cross-domain correspondence mechanism to synthesize dance-guided person image in target video by consecutive frames and pose stick images. 3) coarse-to-fine generation strategy which includes two GANs: a GAN used to reconstruct human face in target video, the other generates smoothing frame sequences. Finally, we compress the sequential frames generated from our model into video format. Compared with previous works, our model manifests better person appearance consistency and time coherence in video-to-video synthesis for human motion transfer, which makes the generated video look more realistic. The qualitative and quantitative comparisons represent our approach performs significant improvements over the state-of-the-art methods. Experiments on synthetic frames and ground truth validate the effectiveness of the proposed method. INDEX TERMS Generative adversarial network (GAN), image-to-image translation, video-to-video synthesis, pose-guided person image generation

show abstract

“…Extending this problem to video brings more challenges as the inpainted content needs to be consistent across the frames of the video. Zhang et al [80] proposed a UNNP-based video inpainting algorithm that is able to generate missing appearance and motion information, while enforcing visually plausible textures. Furthermore, they showed that their proposed framework is able to ensure mutual consistency of both appearance and optical flow of the video.…”

Section: Video Inpaintingmentioning

confidence: 99%