2021
DOI: 10.1145/3478513.3480546
|View full text |Cite
|
Sign up to set email alerts
|

Layered neural atlases for consistent video editing

Abstract: We present a method that decomposes, and "unwraps", an input video into a set of layered 2D atlases , each providing a unified representation of the appearance of an object (or background) over the video. For each pixel in the video, our method estimates its corresponding 2D coordinate in each of the atlases, giving us a consistent parameterization of the video, along with an associated alpha (opacity) value. Importantly, we design our atlases to be interpretable… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
30
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 55 publications
(32 citation statements)
references
References 36 publications
1
30
0
Order By: Relevance
“…Although our method outputs a rich video decomposition, and not simply object masks, we evaluate it on standard video object segmentation benchmarks where we obtain competitive results. On DAVIS [26], we obtain decomposition results that are similar to recent approaches that require user mask initialization [11], while being faster to optimize (30 minutes vs. 10 hours) due to the low dimensionality of our deformation model. We further demonstrate our approach on a variety of Internet clips, where off-the-shelf segmentation methods do not generalize to discover meaningful groupings.…”
Section: Introductionsupporting
confidence: 70%
See 3 more Smart Citations
“…Although our method outputs a rich video decomposition, and not simply object masks, we evaluate it on standard video object segmentation benchmarks where we obtain competitive results. On DAVIS [26], we obtain decomposition results that are similar to recent approaches that require user mask initialization [11], while being faster to optimize (30 minutes vs. 10 hours) due to the low dimensionality of our deformation model. We further demonstrate our approach on a variety of Internet clips, where off-the-shelf segmentation methods do not generalize to discover meaningful groupings.…”
Section: Introductionsupporting
confidence: 70%
“…As such, we can easily recover Deformable Sprites for objects that do not fall into common categories. This is in contrast to layer decomposition methods such as [21], [27] or [11], which rely on input masks, either from off-the-shelf pre-trained segmentation models or from user interaction. In Figure 4, we show the masks we obtain from our representation compared to offthe-shelf Mask RCNN, and two recent motion segmentation methods [40], [38], trained on DAVIS.…”
Section: Qualitative Results On Real Videosmentioning
confidence: 99%
See 2 more Smart Citations
“…Most notably, the Neural Radiance Field (NeRF) [Mildenhall et al 2020] replaces the traditional notion of geometry and appearance with a single neural network where any new camera views can be realistically rendered by querying respective rays from the camera via neural inference. Despite its effectiveness, NeRF and its extensions have been largely focused on the static object, with a few exceptions [Kasten et al 2021;Lu et al 2020;Zhang et al 2021a] to directly tackle dynamic scenes. Further, existing solutions are still a few orders of magnitudes slower than real-time to support immersive volumography.…”
Section: Introductionmentioning
confidence: 99%