Gang Yang scite author profile

This paper attacks the challenging problem of zeroexample video retrieval. In such a retrieval paradigm, an end user searches for unlabeled videos by ad-hoc queries described in natural language text with no visual example provided. Given videos as sequences of frames and queries as sequences of words, an effective sequence-to-sequence cross-modal matching is required. The majority of existing methods are concept based, extracting relevant concepts from queries and videos and accordingly establishing associations between the two modalities. In contrast, this paper takes a concept-free approach, proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own. Dual encoding is conceptually simple, practically effective and endto-end. As experiments on three benchmarks, i.e. MSR-VTT, TRECVID 2016 and 2017 Ad-hoc Video Search show, the proposed solution establishes a new state-of-the-art for zero-example video retrieval.

show abstract

Dual Encoding for Video Retrieval by Text

Dong

et al. 2021

IEEE Trans. Pattern Anal. Mach. Intell.

143

121

View full text Add to dashboard Cite

COCO-CN for Cross-Lingual Image Tagging, Captioning, and Retrieval

Wang

et al. 2019

IEEE Trans. Multimedia

View full text Add to dashboard Cite

This paper contributes to cross-lingual image annotation and retrieval in terms of data and baseline methods. We propose COCO-CN, a novel dataset enriching MS-COCO with manually written Chinese sentences and tags. For more effective annotation acquisition, we develop a recommendationassisted collective annotation system, automatically providing an annotator with several tags and sentences deemed to be relevant with respect to the pictorial content. Having 20,342 images annotated with 27,218 Chinese sentences and 70,993 tags, COCO-CN is currently the largest Chinese-English dataset that provides a unified and challenging platform for cross-lingual image tagging, captioning and retrieval. We develop conceptually simple yet effective methods per task for learning from crosslingual resources. Extensive experiments on the three tasks justify the viability of the proposed dataset and methods. Data and code are publicly available at https://github.com/li-xirong/coco-cn.

show abstract

Zero-shot Image Tagging by Hierarchical Semantic Embedding

Liao

Lan

et al. 2015

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Gang Yang

Detect Globally, Refine Locally: A Novel Approach to Saliency Detection

Dual Encoding for Zero-Example Video Retrieval

Dual Encoding for Video Retrieval by Text

COCO-CN for Cross-Lingual Image Tagging, Captioning, and Retrieval

Zero-shot Image Tagging by Hierarchical Semantic Embedding

Contact Info

Product

Resources

About