2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.01358
|View full text |Cite
|
Sign up to set email alerts
|

Motion-Aware Dynamic Architecture for Efficient Frame Interpolation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(8 citation statements)
references
References 35 publications
0
8
0
Order By: Relevance
“…The area of video frame interpolation is much more diverse than these categories though. There is research on using more than two frames as input [6,35,55,65], interpolating footage from event cameras [34,60,62,67], efficient model design [10,11,13], test-time adaptation [9,53], utilizing hybrid imaging systems [49], handling quantization artifacts [61], as well as joint deblurring [55] and joint superresolution [30,64]. Our proposed splatting-based synthesis technique is orthogonal to such research directions.…”
Section: Related Workmentioning
confidence: 99%
“…The area of video frame interpolation is much more diverse than these categories though. There is research on using more than two frames as input [6,35,55,65], interpolating footage from event cameras [34,60,62,67], efficient model design [10,11,13], test-time adaptation [9,53], utilizing hybrid imaging systems [49], handling quantization artifacts [61], as well as joint deblurring [55] and joint superresolution [30,64]. Our proposed splatting-based synthesis technique is orthogonal to such research directions.…”
Section: Related Workmentioning
confidence: 99%
“…To improve model efficiency, Ding et al [24] introduce model compression [55]. Spatio-temporal decoding methods are also proposed to directly convert spatio-temporal features into target frames via channel attention [27], [28] or 3D convolutions [29]. However, most of these methods generate outputs at a fixed time, typically halfway between the input images, which limits arbitrary-time interpolation and linearly increases the runtime for multi-frame interpolation.…”
Section: Related Workmentioning
confidence: 99%
“…The referenced research can roughly be categorized into motion-free and motion-based, depending on whether or not cues like optical flow are incorporated [18], [19], [20], [21], [22]. Motion-free models typically rely on kernel prediction [23], [24], [25], [26] or spatio-temporal decoding [27], [28], [29], which are effective but limited to interpolating frames at fixed time steps and their runtime increases linearly in the number of desired output frames. On the other end of the spectrum, motion-based approaches establish dense correspondences between frames and apply warping to render the intermediate pixels.…”
Section: Introductionmentioning
confidence: 99%
“…Recent work has explored a few strategies for improving the performance of such methods. These efforts include utilizing additional contextual information to interpolate high-quality results [32], developing unsupervised techniques by cycle consistency [39], detecting the occlusion by exploring the depth information [2], forwardwarping input frames using softmax splatting [33], using quadratic interpolation to overcome the limitation of linear models [24,50], leveraging the distillation loss to supervise the intermediate flows [17], and constructing efficient architectures for large resolution images [10,43]. We note that methods built upon convolutional networks generally face challenges of modeling long-term dependencies thus limiting large motion handling.…”
Section: Video Frame Interpolationmentioning
confidence: 99%