Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots

Chen, Chao-Yeh; Grauman, Kristen

doi:10.1109/cvpr.2013.80

Cited by 33 publications

(22 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our work also targets action recognition in static images, but, unlike any of the above, we equip static images with dynamics learned from videos. To our knowledge, the only prior static-image approach to explicitly leverage video dynamics is [4]. However, whereas [4] leverages video to augment training images for the low-shot learning scenario, our method leverages video as a motion prior that enhances test observations.…”

Section: Related Workmentioning

confidence: 99%

Im2Flow: Motion Hallucination from Static Images for Action Recognition

Gao

Xiong

Grauman

2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Self Cite

105

View full text Add to dashboard Cite

Existing methods to recognize actions in static images take the images at their face value, learning the appearances-objects, scenes, and body poses-that distinguish each action class. However, such models are deprived of the rich dynamic structure and motions that also define human activity. We propose an approach that hallucinates the unobserved future motion implied by a single snapshot to help static-image action recognition. The key idea is to learn a prior over short-term dynamics from thousands of unlabeled videos, infer the anticipated optical flow on novel static images, and then train discriminative models that exploit both streams of information. Our main contributions are twofold. First, we devise an encoder-decoder convolutional neural network and a novel optical flow encoding that can translate a static image into an accurate flow map. Second, we show the power of hallucinated flow for recognition, successfully transferring the learned motion into a standard two-stream network for activity recognition. On seven datasets, we demonstrate the power of the approach. It not only achieves state-of-the-art accuracy for dense optical flow prediction, but also consistently enhances recognition of actions and dynamic scenes.

show abstract

Section: Related Workmentioning

confidence: 99%

Im2Flow: Motion Hallucination from Static Images for Action Recognition

Gao

Xiong

Grauman

2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Self Cite

105

View full text Add to dashboard Cite

show abstract

“…Our research is closely related to the recent work on visual data collection from web images [42,3,8,14] or weakly annotated videos [2]. Their goal is to collect training images from the Internet with minimum human supervision, but for predefined concepts.…”

Section: Related Workmentioning

confidence: 99%

Automatic Concept Discovery from Parallel Text and Visual Corpora

Sun

Gan

Nevatia

2015

2015 IEEE International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Humans connect language and vision to perceive the world. How to build a similar connection for computers? One possible way is via visual concepts, which are text terms that relate to visually discriminative entities. We propose an automatic visual concept discovery algorithm using parallel text and visual corpora; it filters text terms based on the visual discriminative power of the associated images, and groups them into concepts using visual and semantic similarities. We illustrate the applications of the discovered concepts using bidirectional image and sentence retrieval task and image tagging task, and show that the discovered concepts not only outperform several large sets of manually selected concepts significantly, but also achieves the stateof-the-art performance in the retrieval task.

show abstract

“…Recently, it is gaining popularity to address challenging computer vision problems by leveraging both images and videos. New powerful algorithms have been developed by pursuing synergic interplay between the two complementary domains of information, especially in the areas of adapting object detectors between images and videos [16,20], human activity recognition [3], and event detection [4]. However, the storyline reconstruction extracted from both images and videos still remains as a novel and largely under-addressed problem.…”

Section: Previous Workmentioning

confidence: 99%

Joint Summarization of Large-Scale Collections of Web Images and Videos for Storyline Reconstruction

Kim

Sigal

Xing

2014

2014 IEEE Conference on Computer Vision and Pattern Recognition

163

119

View full text Add to dashboard Cite

In this paper, we address the problem of jointly summarizing large sets of Flickr images and YouTube videos. Starting from the intuition that the characteristics of the two media types are different yet complementary, we develop a fast and easily-parallelizable approach for creating not only high-quality video summaries but also novel structural summaries of online images as storyline graphs. The storyline graphs can illustrate various events or activities associated with the topic in a form of a branching network. The video summarization is achieved by diversity ranking on the similarity graphs between images and video frames. The reconstruction of storyline graphs is formulated as the inference of sparse time-varying directed graphs from a set of photo streams with assistance of videos. For evaluation, we collect the datasets of 20 outdoor activities, consisting of 2.7M Flickr images and 16K YouTube videos. Due to the large-scale nature of our problem, we evaluate our algorithm via crowdsourcing using Amazon Mechanical Turk. In our experiments, we demonstrate that the proposed joint summarization approach outperforms other baselines and our own methods using videos or images only.

show abstract

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots

Abstract: We

Cited by 33 publications

References 33 publications

Im2Flow: Motion Hallucination from Static Images for Action Recognition

Im2Flow: Motion Hallucination from Static Images for Action Recognition

Automatic Concept Discovery from Parallel Text and Visual Corpora

Joint Summarization of Large-Scale Collections of Web Images and Videos for Storyline Reconstruction

Contact Info

Product

Resources

About