Hierarchical structure and different semantic roles of joints in human skeleton convey important information for action recognition. Conventional graph convolution methods for modeling skeleton structure consider only physically connected neighbors of each joint, and the joints of the same type, thus failing to capture highorder information. In this work, we propose a novel model with motif-based graph convolution to encode hierarchical spatial structure, and a variable temporal dense block to exploit local temporal information over different ranges of human skeleton sequences. Moreover, we employ a non-local block to capture global dependencies of temporal domain in an attention mechanism. Our model achieves improvements over the stateof-the-art methods on two large-scale datasets.
Patch-based image synthesis methods have been successfully applied for various editing tasks on still images, videos and stereo pairs. In this work we extend patch-based synthesis to plenoptic images captured by consumer-level lenselet-based devices for interactive, efficient light field editing. In our method the light field is represented as a set of images captured from different viewpoints. We decompose the central view into different depth layers, and present it to the user for specifying the editing goals. Given an editing task, our method performs patch-based image synthesis on all affected layers of the central view, and then propagates the edits to all other views. Interaction is done through a conventional 2D image editing user interface that is familiar to novice users. Our method correctly handles object boundary occlusion with semi-transparency, thus can generate more realistic results than previous methods. We demonstrate compelling results on a wide range of applications such as hole-filling, object reshuffling and resizing, changing object depth, light field upscaling and parallax magnification.
Researchers have achieved great success in dealing with 2D images using deep learning. In recent years, 3D computer vision and geometry deep learning have gained ever more attention. Many advanced techniques for 3D shapes have been proposed for different applications. Unlike 2D images, which can be uniformly represented by a regular grid of pixels, 3D shapes have various representations, such as depth images, multi-view images, voxels, point clouds, meshes, implicit surfaces, etc. The performance achieved in different applications largely depends on the representation used, and there is no unique representation that works well for all applications. Therefore, in this survey, we review recent developments in deep learning for 3D geometry from a representation perspective, summarizing the advantages and disadvantages of different representations for different applications. We also present existing datasets in these representations and further discuss future research directions.
Virtual reality (VR) offers an artificial, computer generated simulation of a real life environment. It originated in the 1960s and has evolved to provide increasing immersion, interactivity, imagination, and intelligence. Because deep learning systems are able to represent and compose information at various levels in a deep hierarchical fashion, they can build very powerful models which leverage large quantities of visual media data. Intelligence of VR methods and applications has been significantly boosted by the recent developments in deep learning techniques. VR content creation and exploration relates to image and video analysis, synthesis and editing, so deep learning methods such as fully convolutional networks and general adversarial networks are widely employed, designed specifically to handle panoramic images and video and virtual 3D scenes. This article surveys recent research that uses such deep learning methods for VR content creation and exploration. It considers the problems involved, and discusses possible future directions in this active and emerging research area. Keywords virtual reality; deep learning; neural networks; 360 • image and video virtual content
This paper surveys the state-of-the-art of research in patch-based synthesis. Patch-based methods synthesize output images by copying small regions from exemplar imagery. This line of research originated from an area called "texture synthesis", which focused on creating regular or semi-regular textures from small exemplars. However, more recently, much research has focused on synthesis of larger and more diverse imagery, such as photos, photo collections, videos, and light fields. Additionally, recent research has focused on customizing the synthesis process for particular problem domains, such as synthesizing artistic or decorative brushes, synthesis of rich materials, and synthesis for 3D fabrication. This report investigates recent papers that follow these themes, with a particular emphasis on papers published since 2009, when the last survey in this area was published. This survey can serve as a tutorial for readers who are not yet familiar with these topics, as well as provide comparisons between these papers, and highlight some open problems in this area.
Figure 1: An example of object-based image manipulation: augmenting an object in a scene. (a) Original image with object (horse) selected in red, and augmentation (jockey) sketched in green (b) The output image containing the added jockey and other regions of the source horse (saddle, reins, shadows) (c) Alternative results (d) Internet photos used as sources. AbstractWe present a framework for interactively manipulating objects in a photograph using related objects obtained from internet images. Given an image, the user selects an object to modify, and provides keywords to describe it. Objects with a similar shape are retrieved and segmented from online images matching the keywords, and deformed to correspond with the selected object. By matching the candidate object and adjusting manipulation parameters, our method appropriately modifies candidate objects and composites them into the scene. Supported manipulations include transferring texture, color and shape from the matched object to the target in a seamless manner. We demonstrate the versatility of our framework using several inputs of varying complexity, for object completion, augmentation, replacement and revealing. Our results are evaluated using a user study.
Abstract-This paper proposes an image enhancement method to optimize photo composition, by rearranging foreground objects in the photo. To adjust objects' positions while keeping the original scene content, we first perform a novel structure dependence analysis on the image to obtain the dependencies between all background regions. To determine the optimal positions for foreground objects, we formulate an optimization problem based on widely used heuristics for aesthetically pleasing pictures. Semantic relations between foreground objects are also taken into account during optimization. The final output is produced by moving foreground objects, together with their dependent regions, to optimal positions. The results show that our approach can effectively optimize photos with single or multiple foreground objects without compromising the original photo content.
This paper proposes an approach to contentpreserving image stitching with regular boundary constraints, which aims to stitch multiple images to generate a panoramic image with regular boundary. Existing methods treat image stitching and rectangling as two separate steps, which may result in suboptimal results as the stitching process is not aware of the further warping needs for rectangling. We address these limitations by formulating image stitching with regular boundaries in a unified optimization. Starting from the initial stitching results produced by traditional warping-based optimization, we obtain the irregular boundary from the warped meshes by polygon Boolean operations which robustly handle arbitrary mesh compositions, and by analyzing the irregular boundary construct a piecewise rectangular boundary. Based on this, we further incorporate straight line preserving and regular boundary constraints into the image stitching framework, and conduct iterative optimization to obtain an optimal piecewise rectangular boundary, thus can make the boundary of stitching results as close as possible to a rectangle, while reducing unwanted distortions. We further extend our method to selfie expansion and video stitching, by integrating the portrait preservation and temporal coherence into the optimization. Experiments show that our method efficiently produces visually pleasing panoramas with regular boundaries and unnoticeable distortions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.