Tong Yu scite author profile

We propose a deep learning approach for user-guided image colorization. The system directly maps a grayscale image, along with sparse, local user "hints" to an output colorization with a Convolutional Neural Network (CNN). Rather than using hand-defined rules, the network propagates user edits by fusing low-level cues along with high-level semantic information, learned from large-scale data. We train on a million images, with simulated user inputs. To guide the user towards efficient input selection, the system recommends likely colors based on the input image and current user inputs. The colorization is performed in a single feed-forward pass, enabling real-time use. Even with randomly simulated user inputs, we show that the proposed system helps novice users quickly create realistic colorizations, and offers large improvements in colorization quality with just a minute of use. In addition, we demonstrate that the framework can incorporate other user "hints" to the desired colorization, showing an application to color histogram transfer. Our code and models are available at https://richzhang.github.io/ideepcolor.

show abstract

One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning

Finn

Xie

et al. 2018

206

186

View full text Add to dashboard Cite

Abstract-Humans and animals are capable of learning a new behavior by observing others perform the skill just once. We consider the problem of allowing a robot to do the same -learning from a video of a human, even when there is domain shift in the perspective, environment, and embodiment between the robot and the observed human. Prior approaches to this problem have hand-specified how human and robot actions correspond and often relied on explicit human pose detection systems. In this work, we present an approach for one-shot learning from a video of a human by using human and robot demonstration data from a variety of previous tasks to build up prior knowledge through meta-learning. Then, combining this prior knowledge and only a single video demonstration from a human, the robot can perform the task that the human demonstrated. We show experiments on both a PR2 arm and a Sawyer arm, demonstrating that after meta-learning, the robot can learn to place, push, and pick-andplace new objects using just one video of a human performing the manipulation.

show abstract

Gradient Surgery for Multi-Task Learning

Yu¹,

Kumar²,

Gupta³

et al. 2020

Preprint

135

View full text Add to dashboard Cite

The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction

et al. 2020

View full text Add to dashboard Cite

Real-Time User-Guided Image Colorization with Learned Deep Priors

Zhang¹,

Zhu²,

Isola³

et al. 2017

Preprint

View full text Add to dashboard Cite

ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering

et al. 2019

AAAI

124

View full text Add to dashboard Cite

Recent developments in modeling language and vision have been successfully applied to image question answering. It is both crucial and natural to extend this research direction to the video domain for video question answering (VideoQA). Compared to the image domain where large scale and fully annotated benchmark datasets exists, VideoQA datasets are limited to small scale and are automatically generated, etc. These limitations restrict their applicability in practice. Here we introduce ActivityNet-QA, a fully annotated and large scale VideoQA dataset. The dataset consists of 58,000 QA pairs on 5,800 complex web videos derived from the popular ActivityNet dataset. We present a statistical analysis of our ActivityNet-QA dataset and conduct extensive experiments on it by comparing existing VideoQA baselines. Moreover, we explore various video representation strategies to improve VideoQA performance, especially for long videos. The dataset is available at https://github.com

show abstract

Understanding and improving recurrent networks for human activity recognition by continuous attention

Zhang

Gao

et al. 2018

140

View full text Add to dashboard Cite

Deep neural networks, including recurrent networks, have been successfully applied to human activity recognition. Unfortunately, the final representation learned by recurrent networks might encode some noise (irrelevant signal components, unimportant sensor modalities, etc.). Besides, it is difficult to interpret the recurrent networks to gain insight into the models' behavior. To address these issues, we propose two attention models for human activity recognition: temporal attention and sensor attention. These two mechanisms adaptively focus on important signals and sensor modalities. To further improve the understandability and mean F1 score, we add continuity constraints, considering that continuous sensor signals are more robust than discrete ones. We evaluate the approaches on three datasets and obtain state-of-theart results. Furthermore, qualitative analysis shows that the attention learned by the models agree well with human intuition.

show abstract

One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning

Finn

Xie

et al. 2018

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tong Yu

Real-time user-guided image colorization with learned deep priors

One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning

Gradient Surgery for Multi-Task Learning

The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction

Real-Time User-Guided Image Colorization with Learned Deep Priors

ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering

Understanding and improving recurrent networks for human activity recognition by continuous attention

One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning

Contact Info

Product

Resources

About