A Hybridized Deep Learning Method for Bengali Image Captioning

Humaira, Mayeesha; Paul, Shimul; Abidur,; Saha, Amit; Muhammad, Faisal

doi:10.14569/ijacsa.2021.0120287

Cited by 18 publications

(12 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In literature, vision encoder is designed using stacked Convolutional Neutral Network (CNN) [1], and graph-based network [2]. Moreover, various pre-trained feature extractors such as VGG-16, InceptionResnetV2, and Xception have been deployed for vision encoding [3], [12] The language decoder is implemented using variations of Recurrent Neural Networks (RNNs) such as LSTMs and GRUs [2]. In addition, self attention based transformer models are utilised to design the language decoder [13].…”

Section: Literature Reviewmentioning

confidence: 99%

See 1 more Smart Citation

CapNet: An Encoder-Decoder based Neural Network Model for Automatic Bangla Image Caption Generation

Rahman¹,

Murad²,

Rahman³

et al. 2022

IJACSA

View full text Add to dashboard Cite

Automatic caption generation from images has become an active research topic in the field of Computer Vision (CV) and Natural Language Processing (NLP). Machine generated image caption plays a vital role for the visually impaired people by converting the caption to speech to have a better understanding of their surrounding. Though significant amount of research has been conducted for automatic caption generation in other languages, far too little effort has been devoted to Bangla image caption generation. In this paper, we propose an encoder-decoder based model which takes an image as input and generates the corresponding Bangla caption as output. The encoder network consists of a pretrained image feature extractor called ResNet-50, while the decoder network consists of Bidirectional LSTMs for caption generation. The model has been trained and evaluated using a Bangla image captioning dataset named BanglaLekhaIm-ageCaptions. The proposed model achieved a training accuracy of 91% and BLEU-1, BLEU-2, BLEU-3, BLEU-4 scores of 0.81, 0.67, 0.57, and 0.51 respectively. Moreover, a comparative study for different pretrained feature extractors such as VGG-16 and Xception is presented. Finally, the proposed model has been deployed on an embedded device for analysing the inference time and power consumption.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

“…Researchers have not addressed automatic image captioning in Bangla for a long period of time due to a lack of an enriched dataset. After development of required dataset, several researches have been conducted on Bangla caption generation from visual image [3], [4], [5], [6], [7].…”

Section: Introductionmentioning

confidence: 99%

CapNet: An Encoder-Decoder based Neural Network Model for Automatic Bangla Image Caption Generation

Rahman¹,

Murad²,

Rahman³

et al. 2022

IJACSA

View full text Add to dashboard Cite

show abstract

“…Furthermore, [12] utilized the BNLIT dataset to implement a CNN-RNN model where they used both BRNN and LSTM as RNN. Humaira et al [3] proposed a hybridized encoder-decoder approach where two word embeddings fastText and GloVe were concatenated. They also utilized beam search and greedy search to compute the BLEU scores.…”

Section: Image Captioning In Bengalimentioning

confidence: 99%

“…For image captioning in Bengali, those 40455 captions were converted to Bengali language using Google Translator. 3 Unfortunately, some of the translated captions were syntactically incorrect as shown in Fig. 5.…”

Section: Flickr8k_bnmentioning

confidence: 99%

“…In this task, the encoder is used to extract the image feature to obtain feature vectors, then pass it through an RNN to generate the language description. Previously, all researchers utilized this CNN-RNN [1][2][3] approach to generate captions from images. However, this method has a drawback, that is, due to the structure of the LSTM or other RNNs, the current output depends on the hidden state at the previous moment.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Bornon: Bengali Image Captioning with Transformer-Based Deep Learning Approach

et al. 2021

Self Cite

View full text Add to dashboard Cite

Image captioning using encoder-decoder-based approach where CNN is used as the Encoder and sequence generator like RNN as Decoder has proven to be very effective. However, this method has a drawback, that is, sequence needs to be processed in order. To overcome this drawback, some researchers have utilized the transformer model to generate captions from images using English datasets. However, none of them generated captions in Bengali using the transformer model. As a result, we utilized three different Bengali datasets to generate Bengali captions from images using the transformer model. Additionally, we compared the performance of the transformer-based model with a visual attention-based encoder-decoder approach. Finally, we compared the result of the transformer-based model with other models that employed different Bengali image captioning datasets.

show abstract

Deep Learning Based Bengali Image Caption Generation

Das

Das³

2023

Lecture Notes in Networks and Systems

View full text Add to dashboard Cite

A Hybridized Deep Learning Method for Bengali Image Captioning

Cited by 18 publications

References 27 publications

CapNet: An Encoder-Decoder based Neural Network Model for Automatic Bangla Image Caption Generation

CapNet: An Encoder-Decoder based Neural Network Model for Automatic Bangla Image Caption Generation

Bornon: Bengali Image Captioning with Transformer-Based Deep Learning Approach

Deep Learning Based Bengali Image Caption Generation

Contact Info

Product

Resources

About