Zhongzheng Ren scite author profile

In human learning, it is common to use multiple sources of information jointly. However, most existing feature learning approaches learn from only a single task. In this paper, we propose a novel multi-task deep network to learn generalizable high-level visual representations. Since multi-task learning requires annotations for multiple properties of the same training instance, we look to synthetic images to train our network. To overcome the domain difference between real and synthetic data, we employ an unsupervised feature space domain adaptation method based on adversarial learning. Given an input synthetic RGB image, our network simultaneously predicts its surface normal, depth, and instance contour, while also minimizing the feature space domain differences between real and synthetic data. Through extensive experiments, we demonstrate that our network learns more transferable representations compared to single-task baselines. Our learned representation produces state-of-the-art transfer learning results on PAS-CAL VOC 2007 classification and 2012 detection.

show abstract

Learning to Anonymize Faces for Privacy Preserving Action Detection

Ren

Lee

Ryoo³

2018

151

View full text Add to dashboard Cite

UFO$$^2$$: A Unified Framework Towards Omni-supervised Object Detection

Ren

Yang

et al. 2020

View full text Add to dashboard Cite

Existing work on object detection often relies on a single form of annotation: the model is trained using either accurate yet costly bounding boxes or cheaper but less expressive image-level tags. However, real-world annotations are often diverse in form, which challenges these existing works. In this paper, we present UFO 2 , a unified object detection framework that can handle different forms of supervision simultaneously. Specifically, UFO 2 incorporates strong supervision (e.g., boxes), various forms of partial supervision (e.g., class tags, points, and scribbles), and unlabeled data. Through rigorous evaluations, we demonstrate that each form of label can be utilized to either train a model from scratch or to further improve a pre-trained model. We also use UFO 2 to investigate budget-aware omni-supervised learning, i.e., various annotation policies are studied under a fixed annotation budget: we show that competitive performance needs no strong labels for all data. Finally, we demonstrate the generalization of UFO 2 , detecting more than 1,000 different objects without bounding box annotations.

show abstract

Semantic Tracklets: An Object-Centric Representation for Visual Multi-Agent Reinforcement Learning

Liu

Ren

Yeh

et al. 2021

View full text Add to dashboard Cite

3D Spatial Recognition without Spatially Labeled 3D

Ren

Misra

Schwing

et al. 2021

View full text Add to dashboard Cite

Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection

Ren

Yang

et al. 2020

Preprint

View full text Add to dashboard Cite

Occupancy Planes for Single-View RGB-D Human Reconstruction

Zhao¹,

Hu²,

Ren³

et al. 2023

AAAI

View full text Add to dashboard Cite

Single-view RGB-D human reconstruction with implicit functions is often formulated as per-point classification. Specifically, a set of 3D locations within the view-frustum of the camera are first projected independently onto the image and a corresponding feature is subsequently extracted for each 3D location. The feature of each 3D location is then used to classify independently whether the corresponding 3D point is inside or outside the observed object. This procedure leads to sub-optimal results because correlations between predictions for neighboring locations are only taken into account implicitly via the extracted features. For more accurate results we propose the occupancy planes (OPlanes) representation, which enables to formulate single-view RGB-D human reconstruction as occupancy prediction on planes which slice through the camera's view frustum. Such a representation provides more flexibility than voxel grids and enables to better leverage correlations than per-point classification. On the challenging S3D data we observe a simple classifier based on the OPlanes representation to yield compelling results, especially in difficult situations with partial occlusions due to other objects and partial visibility, which haven't been addressed by prior work.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhongzheng Ren

Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection

Cross-Domain Self-Supervised Multi-task Feature Learning Using Synthetic Imagery

Learning to Anonymize Faces for Privacy Preserving Action Detection

UFO$$^2$$: A Unified Framework Towards Omni-supervised Object Detection

Semantic Tracklets: An Object-Centric Representation for Visual Multi-Agent Reinforcement Learning

3D Spatial Recognition without Spatially Labeled 3D

Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection

Occupancy Planes for Single-View RGB-D Human Reconstruction

Contact Info

Product

Resources

About