Yangyan Li scite author profile

Object viewpoint estimation from 2D images is an essential task in computer vision. However, two issues hinder its progress: scarcity of training data with viewpoint annotations, and a lack of powerful features. Inspired by the growing availability of 3D models, we propose a framework to address both issues by combining renderbased image synthesis and CNNs. We believe that 3D models have the potential in generating a large number of images of high variation, which can be well exploited by deep CNN with a high learning capacity. Towards this goal, we propose a scalable and overfit-resistant image synthesis pipeline, together with a novel CNN specifically tailored for the viewpoint estimation task. Experimentally, we show that the viewpoint estimation from our pipeline can significantly outperform state-of-the-art methods on PASCAL 3D+ benchmark.

show abstract

Synthesizing Training Images for Boosting Human 3D Pose Estimation

Chen

Wang

et al. 2016

249

217

View full text Add to dashboard Cite

Human 3D pose estimation from a single image is a challenging task with numerous applications. Convolutional Neural Networks (CNNs) have recently achieved superior performance on the task of 2D pose estimation from a single image, by training on images with 2D annotations collected by crowd sourcing. This suggests that similar success could be achieved for direct estimation of 3D poses. However, 3D poses are much harder to annotate, and the lack of suitable annotated training images hinders attempts towards end-to-end solutions. To address this issue, we opt to automatically synthesize training images with ground truth pose annotations. Our work is a systematic study along this road. We find that pose space coverage and texture diversity are the key ingredients for the effectiveness of synthetic training data. We present a fully automatic, scalable approach that samples the human pose space for guiding the synthesis procedure and extracts clothing textures from real images. Furthermore, we explore domain adaptation for bridging the gap between our synthetic training images and real testing photos. We demonstrate that CNNs trained with our synthetic images out-perform those trained with real photos on 3D pose estimation tasks.

show abstract

Database‐Assisted Object Retrieval for Real‐Time 3D Reconstruction

Dai

Guibas

et al. 2015

Computer Graphics Forum

168

118

View full text Add to dashboard Cite

Figure 1: As a cluttered 3D scene is scanned and reconstructed in real time (top), we continuously query a large shape database, retrieving and registering similar objects to the scan (bottom). Noisy, partial scanned geometry is then replaced with the models, resulting in a complete, high-quality semantic reconstruction (middle). AbstractIn recent years, real-time 3D scanning technology has developed significantly and is now able to capture large environments with considerable accuracy. Unfortunately, the reconstructed geometry still suffers from incompleteness, due to occlusions and lack of view coverage, resulting in unsatisfactory reconstructions. In order to overcome these fundamental physical limitations, we present a novel reconstruction approach based on retrieving objects from a 3D shape database while scanning an environment in real-time. With this approach, we are able to replace scanned RGB-D data with complete, hand-modeled objects from shape databases. We align and scale retrieved models to the input data to obtain a high-quality virtual representation of the real-world environment that is quite faithful to the original geometry. In contrast to previous methods, we are able to retrieve objects in cluttered and noisy scenes even when the database contains only similar models, but no exact matches. In addition, we put a strong focus on object retrieval in an interactive scanning context -our algorithm runs directly on 3D scanning data structures, and is able to query databases of thousands of models in an online fashion during scanning.

show abstract

Joint embeddings of shapes and images via CNN image purification

et al. 2015

ACM Trans. Graph.

147

117

View full text Add to dashboard Cite

show abstract

Analyzing growing plants from 4D point cloud data

Li¹,

Fan²,

Mitra

et al. 2013

ACM Trans. Graph.

View full text Add to dashboard Cite

Day 35 bifurcation bifurcation Decay Decay Decay Decay budding budding Decay Decay bifurcation bifurcation budding budding bifurcation bifurcationFigure 1: (Top) Dishlia growth time lapse point cloud over 5 weeks, with classified organs and detected budding, bifurcation and decay events. (Bottom) The extracted events are then used to bring a static plant model to life with both motion and growth. AbstractStudying growth and development of plants is of central importance in botany. Current quantitative are either limited to tedious and sparse manual measurements, or coarse image-based 2D measurements. Availability of cheap and portable 3D acquisition devices has the potential to automate this process and easily provide scientists with volumes of accurate data, at a scale much beyond the realms of existing methods. However, during their development, plants grow new parts (e.g., vegetative buds) and bifurcate to different components -violating the central incompressibility assumption made by existing acquisition algorithms, which makes these algorithms unsuited for analyzing growth. We introduce a framework to study plant growth, particularly focusing on accurate localization and tracking topological events like budding and bifurcation. This is achieved by a novel forward-backward analysis, wherein we track robustly detected plant components back in time to ensure correct spatio-temporal event detection using a locally adapting threshold. We evaluate our approach on several groups of time lapse scans, often ranging from days to weeks, on a diverse set of plant species and use the results to animate static virtual plants or directly attach them to physical simulators.

show abstract

Face identity disentanglement via latent space mapping

et al. 2020

View full text Add to dashboard Cite

Learning disentangled representations of data is a fundamental problem in artificial intelligence. Specifically, disentangled latent representations allow generative models to control and compose the disentangled factors in the synthesis process. Current methods, however, require extensive supervision and training, or instead, noticeably compromise quality. In this paper, we present a method that learns how to represent data in a disentangled way, with minimal supervision, manifested solely using available pre-trained networks. Our key insight is to decouple the processes of disentanglement and synthesis, by employing a leading pre-trained unconditional image generator, such as StyleGAN. By learning to map into its latent space, we leverage both its state-of-the-art quality, and its rich and expressive latent space, without the burden of training it. We demonstrate our approach on the complex and high dimensional domain of human heads. We evaluate our method qualitatively and quantitatively, and exhibit its success with de-identification operations and with temporal identity coherency in image sequences. Through extensive experimentation, we show that our method successfully disentangles identity from other facial attributes, surpassing existing methods, even though they require more training and supervision.

show abstract

Estimating image depth using shape collections

Huang

Mitra

et al. 2014

ACM Trans. Graph.

View full text Add to dashboard Cite

Figure 1: We attribute a single 2D image of an object (left) with depth by transporting information from a 3D shape deformation subspace learned by analyzing a network of related but different shapes (middle). For visualization, we color code the estimated depth with values increasing from red to blue (right). AbstractImages, while easy to acquire, view, publish, and share, they lack critical depth information. This poses a serious bottleneck for many image manipulation, editing, and retrieval tasks. In this paper we consider the problem of adding depth to an image of an object, effectively 'lifting' it back to 3D, by exploiting a collection of aligned 3D models of related objects shape. Our key insight is that, even when the imaged object is not contained in the shape collection, the network of shapes implicitly characterizes a shape-specific deformation subspace that regularizes the problem and enables robust diffusion of depth information from the shape collection to the input image. We evaluate our fully automatic approach on diverse and challenging input images, validate the results against Kinect depth readings, and demonstrate several imaging applications including depth-enhanced image editing and image relighting.

show abstract

Gold‐Catalyzed Enantioselective Annulations

Zhang

2016

Chemistry A European J

212

View full text Add to dashboard Cite

Non-oxidative,regioselective,and convergent access to densely functionalized oxazoles is realized in af unctional-group tolerant manner using alkynyl thioethers.S ulfur-terminated alkynes provideaccess to reactivity previously requiring strong donor-substituted alkynes such as ynamides.Sulfur does not act in an analogous donor fashion in this gold-catalyzed reaction, thus leading to complementary regioselective outcomes and addressing the limitations of using ynamides. Compared to other heteroatom-substituted alkynes,alkynyl thioethers are remarkably little explored in intermolecular late-transition-metal catalysis,d espite being readily accessed and robust. [1, 2] Ynamides,i nc ontrast, are privileged sub-strates:i np-acid catalysis their donor nature aids metal-alkyne coordination and affords highly polarized electro-philes,t hus providing the high chemo-and regioselectivity required for the discovery of efficient intermolecular reactions (Scheme 1a). [3,4] As the resulting inclusion of ad onor-nitrogen atom limits the utility of the products,r etaining the reactivity profile of these transformations whilst accessing more flexible and readily elaborated substitution patterns would be desirable.T he value of sulfur-substituted compounds [5] coupled with progress in C À Ca nd C-heteroatom bond formation from CÀSbonds, [6] renders alkynyl thioethers appealing alternatives to ynamides.I ndeed the ketenethio-nium pathway (Scheme 1a)f rom alkynyl thioethers has recently been invoked in proton-catalyzed reactions with nitriles [2g,h] and gold-catalyzed reactions with sulfides. [2i] Ynamides enabled the discovery of formal [3+ +2] dipolar cycloadditions with nucleophilic nitrenoids, [7] thus allowing intermolecular access to a-imino gold carbene-type reactivity for heterocycle synthesis (Scheme 1b). [8, 9] Such reactions, which do not depend on ynamides,a re scarce. [8b,h] As trong donor alkyne substituent proved critical in the formation of oxazoles using N-acyl pyridinium N-aminides,aselectron-rich alkynes such as anisole derivatives did not react (Scheme 1b, inset). [8a,b] Oxazoles are valuable synthetic intermediates [10,11] and structural components in bioactive natural products, [12] agrochemicals, [13] ligands, [14] and functional materials. [15] Despite recent advances,asingle modular and convergent route to trisubstituted oxazoles,which provides the structural and functional-group diversity needed across the 2-, 4-, and 5-positions,r emains unrealized. [16] Following our interest in the use of gold catalysis with sulfides [17] we report here on the reactivity of alkynyl thioethers with nucleophilic nitrenoids to prepare oxazoles. Importantly,t he regioselectivity is not consistent with ac on-trolling ketenethionium species.T he sulfur group plays an alternative role in enabling reactivity,t hus proving complementary to donor-enabled approaches. Ther eaction of the alkynyl thioether 1a and aminide 2a (Table 1) showed that conversion into the oxazole 3 was possible at 125 8 8Ci n1 ,2-dichlorobenzene (1,2-DCB;s ...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yangyan Li

Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views

Synthesizing Training Images for Boosting Human 3D Pose Estimation

Database‐Assisted Object Retrieval for Real‐Time 3D Reconstruction

Joint embeddings of shapes and images via CNN image purification

Analyzing growing plants from 4D point cloud data

Face identity disentanglement via latent space mapping

Estimating image depth using shape collections

Gold‐Catalyzed Enantioselective Annulations

Contact Info

Product

Resources

About