Unpaired Image Captioning via Scene Graph Alignments

Gu, Jiuxiang; Joty, Shafiq; Cai, Jianfei; Zhao, Handong; Xu, Yuebin; Wang, Gang

doi:10.1109/iccv.2019.01042

Cited by 139 publications

(115 citation statements)

References 35 publications

Supporting

Mentioning

115

Contrasting

Order By: Relevance

“…Scene segmentation (or scene parsing, semantic segmentation) is one of the fundamental problems in computer vision and has drawn lots of attentions. Recently, thanks to the great success of Convolutional Neural Networks (CNNs) in computer vision [42,68,71,52,25,72,27,80,26], lots of CNNs based segmentation works have been proposed and have achieved great progress [29,22,81,83,84,70,60]. For example, Long et al [54] introduce the fully convolutional networks (FCN) in which the fully connected layers in standard CNNs are transformed to convolutional layers.…”

Section: Related Work 21 Scene Segmentationmentioning

confidence: 99%

Boundary-Aware Feature Propagation for Scene Segmentation

Ding

Jiang

Liu

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

228

107

View full text Add to dashboard Cite

In this work, we address the challenging issue of scene segmentation. To increase the feature similarity of the same object while keeping the feature discrimination of different objects, we explore to propagate information throughout the image under the control of objects' boundaries. To this end, we first propose to learn the boundary as an additional semantic class to enable the network to be aware of the boundary layout. Then, we propose unidirectional acyclic graphs (UAGs) to model the function of undirected cyclic graphs (UCGs), which structurize the image via building graphic pixel-by-pixel connections, in an efficient and effective way. Furthermore, we propose a boundaryaware feature propagation (BFP) module to harvest and propagate the local features within their regions isolated by the learned boundaries in the UAG-structured image. The proposed BFP is capable of splitting the feature propagation into a set of semantic groups via building strong connections among the same segment region but weak connections between different segment regions. Without bells and whistles, our approach achieves new state-of-the-art segmentation performance on three challenging semantic segmentation datasets, i.e., PASCAL-Context, CamVid, and Cityscapes.

show abstract

Section: Related Work 21 Scene Segmentationmentioning

confidence: 99%

Boundary-Aware Feature Propagation for Scene Segmentation

Ding

Jiang

Liu

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

228

107

View full text Add to dashboard Cite

show abstract

“…Generally, a convolutional neural network (CNN) is tasked with feature extraction, while a recurrent neural network (RNN) relies upon to translate the training annotations with the image features [115]. Aside from determining and extracting salient and intricate details in an image, it is equally important to extract the interactions and semantic relationship between such objects and how to illustrate them in the right manner using appropriate tenses and sentence structures [116]. Also, because the training 6 Complexity labels which are texts are different from the features obtained from the images, language model techniques are required to analyze the form, meaning, and context of a sequence of words.…”

Section: Image Captioningmentioning

confidence: 99%

Features to Text: A Comprehensive Survey of Deep Learning on Semantic Segmentation and Image Captioning

et al. 2021

View full text Add to dashboard Cite

With the emergence of deep learning, computer vision has witnessed extensive advancement and has seen immense applications in multiple domains. Specifically, image captioning has become an attractive focal direction for most machine learning experts, which includes the prerequisite of object identification, location, and semantic understanding. In this paper, semantic segmentation and image captioning are comprehensively investigated based on traditional and state-of-the-art methodologies. In this survey, we deliberate on the use of deep learning techniques on the segmentation analysis of both 2D and 3D images using a fully convolutional network and other high-level hierarchical feature extraction methods. First, each domain’s preliminaries and concept are described, and then semantic segmentation is discussed alongside its relevant features, available datasets, and evaluation criteria. Also, the semantic information capturing of objects and their attributes is presented in relation to their annotation generation. Finally, analysis of the existing methods, their contributions, and relevance are highlighted, informing the importance of these methods and illuminating a possible research continuation for the application of semantic image segmentation and image captioning approaches.

show abstract

“…Wang et al (2019b) show that image scene graphs extracted using a trained model can match the captioning performance of an oracle with access to ground-truth graphs. Aligning text-and imagebased scene graphs has also been used to generate image captions without paired data (Gu et al, 2019).…”

Section: Related Workmentioning

confidence: 99%

Diverse and Relevant Visual Storytelling with Scene Graph Embeddings

Hong¹,

Shetty²,

Sayeed³

et al. 2020

Proceedings of the 24th Conference on Computational Natural Language Learning

View full text Add to dashboard Cite

A problem in automatically generated stories for image sequences is that they use overly generic vocabulary and phrase structure and fail to match the distributional characteristics of human-generated text. We address this problem by introducing explicit representations for objects and their relations by extracting scene graphs from the images. Utilizing an embedding of this scene graph enables our model to more explicitly reason over objects and their relations during story generation, compared to the global features from an object classifier used in previous work. We apply metrics that account for the diversity of words and phrases of generated stories as well as for reference to narratively-salient image features and show that our approach outperforms previous systems. Our experiments also indicate that our models obtain competitive results on reference-based metrics.

show abstract

Unpaired Image Captioning via Scene Graph Alignments

Cited by 139 publications

References 35 publications

Boundary-Aware Feature Propagation for Scene Segmentation

Boundary-Aware Feature Propagation for Scene Segmentation

Features to Text: A Comprehensive Survey of Deep Learning on Semantic Segmentation and Image Captioning

Diverse and Relevant Visual Storytelling with Scene Graph Embeddings

Contact Info

Product

Resources

About