Object viewpoint estimation from 2D images is an essential task in computer vision. However, two issues hinder its progress: scarcity of training data with viewpoint annotations, and a lack of powerful features. Inspired by the growing availability of 3D models, we propose a framework to address both issues by combining renderbased image synthesis and CNNs. We believe that 3D models have the potential in generating a large number of images of high variation, which can be well exploited by deep CNN with a high learning capacity. Towards this goal, we propose a scalable and overfit-resistant image synthesis pipeline, together with a novel CNN specifically tailored for the viewpoint estimation task. Experimentally, we show that the viewpoint estimation from our pipeline can significantly outperform state-of-the-art methods on PASCAL 3D+ benchmark.
Human 3D pose estimation from a single image is a challenging task with numerous applications. Convolutional Neural Networks (CNNs) have recently achieved superior performance on the task of 2D pose estimation from a single image, by training on images with 2D annotations collected by crowd sourcing. This suggests that similar success could be achieved for direct estimation of 3D poses. However, 3D poses are much harder to annotate, and the lack of suitable annotated training images hinders attempts towards end-to-end solutions. To address this issue, we opt to automatically synthesize training images with ground truth pose annotations. Our work is a systematic study along this road. We find that pose space coverage and texture diversity are the key ingredients for the effectiveness of synthetic training data. We present a fully automatic, scalable approach that samples the human pose space for guiding the synthesis procedure and extracts clothing textures from real images. Furthermore, we explore domain adaptation for bridging the gap between our synthetic training images and real testing photos. We demonstrate that CNNs trained with our synthetic images out-perform those trained with real photos on 3D pose estimation tasks.
Figure 1: As a cluttered 3D scene is scanned and reconstructed in real time (top), we continuously query a large shape database, retrieving and registering similar objects to the scan (bottom). Noisy, partial scanned geometry is then replaced with the models, resulting in a complete, high-quality semantic reconstruction (middle). AbstractIn recent years, real-time 3D scanning technology has developed significantly and is now able to capture large environments with considerable accuracy. Unfortunately, the reconstructed geometry still suffers from incompleteness, due to occlusions and lack of view coverage, resulting in unsatisfactory reconstructions. In order to overcome these fundamental physical limitations, we present a novel reconstruction approach based on retrieving objects from a 3D shape database while scanning an environment in real-time. With this approach, we are able to replace scanned RGB-D data with complete, hand-modeled objects from shape databases. We align and scale retrieved models to the input data to obtain a high-quality virtual representation of the real-world environment that is quite faithful to the original geometry. In contrast to previous methods, we are able to retrieve objects in cluttered and noisy scenes even when the database contains only similar models, but no exact matches. In addition, we put a strong focus on object retrieval in an interactive scanning context -our algorithm runs directly on 3D scanning data structures, and is able to query databases of thousands of models in an online fashion during scanning.
We jointly embed shapes and images of three categories (chair, aeroplane and car) into a shared space. Distances between entities in the high-dimensional embedding space reflect object similarities between shapes and images (visualized by t-SNE here).
Day 35 bifurcation bifurcation Decay Decay Decay Decay budding budding Decay Decay bifurcation bifurcation budding budding bifurcation bifurcationFigure 1: (Top) Dishlia growth time lapse point cloud over 5 weeks, with classified organs and detected budding, bifurcation and decay events. (Bottom) The extracted events are then used to bring a static plant model to life with both motion and growth. AbstractStudying growth and development of plants is of central importance in botany. Current quantitative are either limited to tedious and sparse manual measurements, or coarse image-based 2D measurements. Availability of cheap and portable 3D acquisition devices has the potential to automate this process and easily provide scientists with volumes of accurate data, at a scale much beyond the realms of existing methods. However, during their development, plants grow new parts (e.g., vegetative buds) and bifurcate to different components -violating the central incompressibility assumption made by existing acquisition algorithms, which makes these algorithms unsuited for analyzing growth. We introduce a framework to study plant growth, particularly focusing on accurate localization and tracking topological events like budding and bifurcation. This is achieved by a novel forward-backward analysis, wherein we track robustly detected plant components back in time to ensure correct spatio-temporal event detection using a locally adapting threshold. We evaluate our approach on several groups of time lapse scans, often ranging from days to weeks, on a diverse set of plant species and use the results to animate static virtual plants or directly attach them to physical simulators.
Learning disentangled representations of data is a fundamental problem in artificial intelligence. Specifically, disentangled latent representations allow generative models to control and compose the disentangled factors in the synthesis process. Current methods, however, require extensive supervision and training, or instead, noticeably compromise quality. In this paper, we present a method that learns how to represent data in a disentangled way, with minimal supervision, manifested solely using available pre-trained networks. Our key insight is to decouple the processes of disentanglement and synthesis, by employing a leading pre-trained unconditional image generator, such as StyleGAN. By learning to map into its latent space, we leverage both its state-of-the-art quality, and its rich and expressive latent space, without the burden of training it. We demonstrate our approach on the complex and high dimensional domain of human heads. We evaluate our method qualitatively and quantitatively, and exhibit its success with de-identification operations and with temporal identity coherency in image sequences. Through extensive experimentation, we show that our method successfully disentangles identity from other facial attributes, surpassing existing methods, even though they require more training and supervision.
Figure 1: We attribute a single 2D image of an object (left) with depth by transporting information from a 3D shape deformation subspace learned by analyzing a network of related but different shapes (middle). For visualization, we color code the estimated depth with values increasing from red to blue (right). AbstractImages, while easy to acquire, view, publish, and share, they lack critical depth information. This poses a serious bottleneck for many image manipulation, editing, and retrieval tasks. In this paper we consider the problem of adding depth to an image of an object, effectively 'lifting' it back to 3D, by exploiting a collection of aligned 3D models of related objects shape. Our key insight is that, even when the imaged object is not contained in the shape collection, the network of shapes implicitly characterizes a shape-specific deformation subspace that regularizes the problem and enables robust diffusion of depth information from the shape collection to the input image. We evaluate our fully automatic approach on diverse and challenging input images, validate the results against Kinect depth readings, and demonstrate several imaging applications including depth-enhanced image editing and image relighting.
Non-oxidative,regioselective,and convergent access to densely functionalized oxazoles is realized in af unctional-group tolerant manner using alkynyl thioethers.S ulfur-terminated alkynes provideaccess to reactivity previously requiring strong donor-substituted alkynes such as ynamides.Sulfur does not act in an analogous donor fashion in this gold-catalyzed reaction, thus leading to complementary regioselective outcomes and addressing the limitations of using ynamides. Compared to other heteroatom-substituted alkynes,alkynyl thioethers are remarkably little explored in intermolecular late-transition-metal catalysis,d espite being readily accessed and robust. [1, 2] Ynamides,i nc ontrast, are privileged sub-strates:i np-acid catalysis their donor nature aids metal-alkyne coordination and affords highly polarized electro-philes,t hus providing the high chemo-and regioselectivity required for the discovery of efficient intermolecular reactions (Scheme 1a). [3,4] As the resulting inclusion of ad onor-nitrogen atom limits the utility of the products,r etaining the reactivity profile of these transformations whilst accessing more flexible and readily elaborated substitution patterns would be desirable.T he value of sulfur-substituted compounds [5] coupled with progress in C À Ca nd C-heteroatom bond formation from CÀSbonds, [6] renders alkynyl thioethers appealing alternatives to ynamides.I ndeed the ketenethio-nium pathway (Scheme 1a)f rom alkynyl thioethers has recently been invoked in proton-catalyzed reactions with nitriles [2g,h] and gold-catalyzed reactions with sulfides. [2i] Ynamides enabled the discovery of formal [3+ +2] dipolar cycloadditions with nucleophilic nitrenoids, [7] thus allowing intermolecular access to a-imino gold carbene-type reactivity for heterocycle synthesis (Scheme 1b). [8, 9] Such reactions, which do not depend on ynamides,a re scarce. [8b,h] As trong donor alkyne substituent proved critical in the formation of oxazoles using N-acyl pyridinium N-aminides,aselectron-rich alkynes such as anisole derivatives did not react (Scheme 1b, inset). [8a,b] Oxazoles are valuable synthetic intermediates [10,11] and structural components in bioactive natural products, [12] agrochemicals, [13] ligands, [14] and functional materials. [15] Despite recent advances,asingle modular and convergent route to trisubstituted oxazoles,which provides the structural and functional-group diversity needed across the 2-, 4-, and 5-positions,r emains unrealized. [16] Following our interest in the use of gold catalysis with sulfides [17] we report here on the reactivity of alkynyl thioethers with nucleophilic nitrenoids to prepare oxazoles. Importantly,t he regioselectivity is not consistent with ac on-trolling ketenethionium species.T he sulfur group plays an alternative role in enabling reactivity,t hus proving complementary to donor-enabled approaches. Ther eaction of the alkynyl thioether 1a and aminide 2a (Table 1) showed that conversion into the oxazole 3 was possible at 125 8 8Ci n1 ,2-dichlorobenzene (1,2-DCB;s ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.