From Traditional to Modern: Domain Adaptation for Action Classification in Short Social Video Clips

Singh, Aditya; Saini, Saurabh; Shah, Rajvi; Narayanan, P. J.

doi:10.1007/978-3-319-45886-1_20

Cited by 1 publication

(2 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similar to all these approaches we also used a neural network for learning embedding function but we focus only on hash tags. [16] maps the video representations to semantic space for improving action classification. Recent image captioning methods [23,6] have shown remarkable progress in generating rich descriptive sentences for natural images.…”

Section: Visual-semantic Joint Embeddingmentioning

confidence: 99%

See 1 more Smart Citation

Learning to hash-tag videos with Tag2Vec

Singh

Saini

Shah

et al. 2016

Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing

Self Cite

View full text Add to dashboard Cite

@research.,pjn@}iiit.ac.in http://cvit.iiit.ac.in/research/projects/tag2vec Figure 1. Learning a direct mapping from videos to hash-tags : sample frames from short video clips with user-given hash-tags (left); a sample frame from a query video and hash-tags suggested by our system for this query (right).ABSTRACT User-given tags or labels are valuable resources for semantic understanding of visual media such as images and videos. Recently, a new type of labeling mechanism known as hashtags have become increasingly popular on social media sites. In this paper, we study the problem of generating relevant and useful hash-tags for short video clips. Traditional datadriven approaches for tag enrichment and recommendation use direct visual similarity for label transfer and propagation. We attempt to learn a direct low-cost mapping from video to hash-tags using a two step training process. We first employ a natural language processing (NLP) technique, skip-gram models with neural network training to learn a low-dimensional vector representation of hash-tags (Tag2Vec) using a corpus of ∼ 10 million hash-tags. We then train an embedding function to map video features to the low-dimensional Tag2vec space. We learn this embedding for 29 categories of short video clips with hash-tags. A query video without any tag-information can then be directly mapped to the vector space of tags using the learned embedding and relevant tags can be found by performing a simple nearest-neighbor retrieval in the Tag2Vec space. We validate the relevance of the tags suggested by our system qualitatively and quantitatively with a user study.

show abstract

Section: Visual-semantic Joint Embeddingmentioning

confidence: 99%

“…Vector representation of words is used by many recent approaches for joint visual semantic learning [17,16,23,27,6]. Socher et al [17] learns an embedding function which performs a mapping of image features to semantic word space.…”

Section: Visual-semantic Joint Embeddingmentioning

confidence: 99%