2020 International Conference on Communication and Signal Processing (ICCSP) 2020
DOI: 10.1109/iccsp48568.2020.9182105
|View full text |Cite
|
Sign up to set email alerts
|

A Review on Automatic Image Captioning Techniques

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
2
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 13 publications
0
3
0
Order By: Relevance
“…The efficiency of the model has been compared with two different datasets Flickr 8k, Flickr 30k and the state-of-theart methods using different captioning metrics. The Flickr 8k and Flickr 30k datasets, as signified by the names, consist of 8000 and 82,783 images, respectively, with five different captions for each image, describing the salient entities and features [16]. Two search methods, beam search by taking the beamwidth of '5', beamwidth '3′ and greedy search have been computed on the output probabilities from the model to evaluate the BLEU scores for each dataset, respectively.…”
Section: Results and Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…The efficiency of the model has been compared with two different datasets Flickr 8k, Flickr 30k and the state-of-theart methods using different captioning metrics. The Flickr 8k and Flickr 30k datasets, as signified by the names, consist of 8000 and 82,783 images, respectively, with five different captions for each image, describing the salient entities and features [16]. Two search methods, beam search by taking the beamwidth of '5', beamwidth '3′ and greedy search have been computed on the output probabilities from the model to evaluate the BLEU scores for each dataset, respectively.…”
Section: Results and Analysismentioning
confidence: 99%
“…Faster RCNN can also be incorporated to generate text descriptions about a specific image in the sequence of LSTM and RNN, which resulted in a BLEU-1 score of about 59.8 [15]. The Flickr 30k dataset has established its importance and advantage over the field of automatic image captioning (AIC), preferred to depict the most remarkable data out of images by finding the relationship among different objects present in the image [16]. The half and half bidirectional LSTM approach with CNN [17] has also depicted significant outcomes for picture extraction and captioning on Flicker complete datasets, thereby achieving a true positive ratio of 86% and a false positive ratio of 10%.…”
Section: Introductionmentioning
confidence: 99%
“…This is while comparing several evaluation metrics including BLEU (1 to 4), CIDEr and METEOR. In [18], the paper put the spotlight on some of the advancement on the image captioning task until early 2020, where various approaches were discussed including N-cut, color-based segmentation and hybrid engine. It also discussed how model engineering and incorporating more hyper-parameters improve the overall pipeline and result in the best accuracy for such models.…”
Section: Related Workmentioning
confidence: 99%
“…As a popular challenge involving sequence modeling, the state-of-the-art (SOTA) problem of photo caption generation uses various approaches. For example, the Convolutional Neural Network, ConvNet, known as the CNN, is applied with other language architecture, like the Recurrent Neural Network (RNN), as a CNN-RNN-based framework approach [3]. This work uses the standard encoder-decoder architecture using a pre-trained CNN model to build feature vectors, and they are then fed into an RNN as the decoder generates the language description.…”
Section: Introductionmentioning
confidence: 99%