Ruotian Luo scite author profile

One property that remains lacking in image captions generated by contemporary methods is discriminability: being able to tell two images apart given the caption for one of them. We propose a way to improve this aspect of caption generation. By incorporating into the captioning training objective a loss component directly related to ability (by a machine) to disambiguate image/caption matches, we obtain systems that produce much more discriminative caption, according to human evaluation. Remarkably, our approach leads to improvement in other aspects of generated captions, reflected by a battery of standard scores such as BLEU, SPICE etc. Our approach is modular and can be applied to a variety of model/loss combinations commonly proposed for image captioning. Code has been made available at:

show abstract

Comprehension-Guided Referring Expressions

Luo

Shakhnarovich

2017

144

105

View full text Add to dashboard Cite

We consider generation and comprehension of natural language referring expression for objects in an image. Unlike generic "image captioning" which lacks natural standard evaluation criteria, quality of a referring expression may be measured by the receiver's ability to correctly infer which object is being described. Following this intuition, we propose two approaches to utilize models trained for comprehension task to generate better expressions. First, we use a comprehension module trained on human-generated expressions, as a "critic" of referring expression generator. The comprehension module serves as a differentiable proxy of human evaluation, providing training signal to the generation module. Second, we use the comprehension module in a generate-and-rerank pipeline, which chooses from candidate expressions generated by a model according to their performance on the comprehension task. We show that both approaches lead to improved referring expression generation on multiple benchmark datasets.

show abstract

DIODE: A Dense Indoor and Outdoor DEpth Dataset

Igor¹,

Kolkin²,

Zhang³

et al. 2019

Preprint

View full text Add to dashboard Cite

We introduce DIODE (Dense Indoor/Outdoor DEpth), a dataset that contains thousands of diverse, high-resolution color images with accurate, dense, long-range depth measurements. DIODE is the first public dataset to include RGBD images of indoor and outdoor scenes obtained with one sensor suite. This is in contrast to existing datasets that involve just one domain/scene type and employ different sensors, making generalization across domains difficult. The dataset is available for download at diode-dataset.org.

show abstract

Pixel Consensus Voting for Panoptic Segmentation

Wang

Luo

Maire

et al. 2020

View full text Add to dashboard Cite

Context-Aware Zero-Shot Recognition

Luo

Zhang²,

Han

et al. 2020

AAAI

View full text Add to dashboard Cite

We present a novel problem setting in zero-shot learning, zero-shot object recognition and detection in the context. Contrary to the traditional zero-shot learning methods, which simply infers unseen categories by transferring knowledge from the objects belonging to semantically similar seen categories, we aim to understand the identity of the novel objects in an image surrounded by the known objects using the inter-object relation prior. Specifically, we leverage the visual context and the geometric relationships between all pairs of objects in a single image, and capture the information useful to infer unseen categories. We integrate our context-aware zero-shot learning framework into the traditional zero-shot learning techniques seamlessly using a Conditional Random Field (CRF). The proposed algorithm is evaluated on both zero-shot region classification and zero-shot detection tasks. The results on Visual Genome (VG) dataset show that our model significantly boosts performance with the additional visual context compared to traditional methods.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ruotian Luo

Discriminability Objective for Training Descriptive Captions

Comprehension-Guided Referring Expressions

DIODE: A Dense Indoor and Outdoor DEpth Dataset

Pixel Consensus Voting for Panoptic Segmentation

Context-Aware Zero-Shot Recognition

Contact Info

Product

Resources

About