Noise Augmented Double-Stream Graph Convolutional Networks for Image Captioning

Wu, Lingxiang; Xu, Min; Sang, Lei; Yao, Ting; Mei, Tao

doi:10.1109/tcsvt.2020.3036860

Cited by 36 publications

(4 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Among the list of the current state-of-the-art works in the field of image captioning which also resembles our idea up to a certain extent only is the use of graph convolution neural networks to understand the global and regional context of an image and its objects [23,24,25]. The graph convolution neural networks are used to understand the semantic and spatial relationship which helps the captioning model to generate spatial tokens e.g., towards, inside, near, etc.…”

Section: Related Workmentioning

confidence: 93%

Image Captioning With Positional and Geometrical Semantics

Haque

Ghani

Saeed³

2021

IEEE Access

View full text Add to dashboard Cite

The last 5 to 6 years have seen tremendous progress in automatic image captioning using deep learning. Initial research focused on the attribute-to-attribute comparison of image features and texts to describe the image as a sentence, the current research is handling issues related to semantics and correlations. However, current state of art research suffers from insufficient concepts when it comes to positional and geometrical attributes. The majority of research relying on CNN's (Convolutional Neural Networks) for object feature extractions has no clue about equivariance and rotational invariance which leads towards the orientation-less understanding of objects for captioning along with longer training time, and larger dataset. Furthermore, CNN's based image captioning encoders also fail to understand the geometrical alignment of object attributes within the image and hence mislabels distorted as correct. To cater to the above issues, we propose ICPS (Captioning with Positional and geometrical Semantics) a capsule network-based image captioning technique along with transformer neural networks as the decoder. The proposed ICPS architecture handles various geometrical properties of image objects with the help of parallelized capsules while the object-to-text decoding is done by Transformer Neural Networks. The inclusion of cluster capsules provides better object understanding in terms of position, equivariance, and geometrical orientation with more augmented object understanding over a small dataset in comparatively less time. The extracted image features provide a better understanding of image objects and help the decoding stage to narrate effectively with positional and geometrical details. We trained and tested our ICPS over the Flickr8k dataset and found ourselves to be better at captioning when it comes to describing the positional and geometrical transitions as compared to other current state-of-the-art research.

show abstract

Section: Related Workmentioning

confidence: 93%

Image Captioning With Positional and Geometrical Semantics

Haque

Ghani

Saeed³

2021

IEEE Access

View full text Add to dashboard Cite

show abstract

“…After that, Anderson et al [3] propose to use Faster R-CNN [36] as encoder and achieve significant improvement. Some subsequent works [4], [5], [8], [37], [38] follow this paradigm. Recently, transformerbased models have demonstrated excellent performance in image captioning task [5], [7], [39]- [41].…”

Section: Introductionmentioning

confidence: 97%

LCM-Captioner: A lightweight text-based image captioning method with collaborative mechanism between vision and text

Wang

Deng

et al. 2023

Neural Networks

View full text Add to dashboard Cite

“…For difficulty 1), encouraging by the method (Wu et al, 2021b), the noise can be injected into RNN hidden states to predict the mean and standard deviation, and manipulate the RNN transition states. In this way, the network robustness can be significantly enhanced and the issue can be well solved.…”

Section: Introductionmentioning

confidence: 99%

Learning joint relationship attention network for image captioning

Wang

2023

Expert Systems with Applications

View full text Add to dashboard Cite

Noise Augmented Double-Stream Graph Convolutional Networks for Image Captioning

Cited by 36 publications

References 47 publications

Image Captioning With Positional and Geometrical Semantics

Image Captioning With Positional and Geometrical Semantics

LCM-Captioner: A lightweight text-based image captioning method with collaborative mechanism between vision and text

Learning joint relationship attention network for image captioning

Contact Info

Product

Resources

About