2021
DOI: 10.14569/ijacsa.2021.0120287
|View full text |Cite
|
Sign up to set email alerts
|

A Hybridized Deep Learning Method for Bengali Image Captioning

Abstract: An omnipresent challenging research topic in computer vision is the generation of captions from an input image. Previously, numerous experiments have been conducted on image captioning in English but the generation of the caption from the image in Bengali is still sparse and in need of more refining. Only a few papers till now have worked on image captioning in Bengali. Hence, we proffer a standard strategy for Bengali image caption generation on two different sizes of the Flickr8k dataset and BanglaLekha data… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 18 publications
(12 citation statements)
references
References 27 publications
0
11
0
Order By: Relevance
“…In literature, vision encoder is designed using stacked Convolutional Neutral Network (CNN) [1], and graph-based network [2]. Moreover, various pre-trained feature extractors such as VGG-16, InceptionResnetV2, and Xception have been deployed for vision encoding [3], [12] The language decoder is implemented using variations of Recurrent Neural Networks (RNNs) such as LSTMs and GRUs [2]. In addition, self attention based transformer models are utilised to design the language decoder [13].…”
Section: Literature Reviewmentioning
confidence: 99%
See 1 more Smart Citation
“…In literature, vision encoder is designed using stacked Convolutional Neutral Network (CNN) [1], and graph-based network [2]. Moreover, various pre-trained feature extractors such as VGG-16, InceptionResnetV2, and Xception have been deployed for vision encoding [3], [12] The language decoder is implemented using variations of Recurrent Neural Networks (RNNs) such as LSTMs and GRUs [2]. In addition, self attention based transformer models are utilised to design the language decoder [13].…”
Section: Literature Reviewmentioning
confidence: 99%
“…Researchers have not addressed automatic image captioning in Bangla for a long period of time due to a lack of an enriched dataset. After development of required dataset, several researches have been conducted on Bangla caption generation from visual image [3], [4], [5], [6], [7].…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, [12] utilized the BNLIT dataset to implement a CNN-RNN model where they used both BRNN and LSTM as RNN. Humaira et al [3] proposed a hybridized encoder-decoder approach where two word embeddings fastText and GloVe were concatenated. They also utilized beam search and greedy search to compute the BLEU scores.…”
Section: Image Captioning In Bengalimentioning
confidence: 99%
“…For image captioning in Bengali, those 40455 captions were converted to Bengali language using Google Translator. 3 Unfortunately, some of the translated captions were syntactically incorrect as shown in Fig. 5.…”
Section: Flickr8k_bnmentioning
confidence: 99%
See 1 more Smart Citation