Bidirectional LSTM approach to image captioning with scene features

Agughalam, Davis; Pathak, Pramod; Stynes, Paul

doi:10.1117/12.2600465

Cited by 5 publications

(4 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It produces around 73.80% accuracy for cosine similarity algorithm. In another experiment we have done comparative analysis with CNN-LSTM (4) and CNN-Bi-LSTM (5) as existing system for caption generation. The below According to this Figure 9 we conclude our system predicts superior results in terms of all performance parameters.…”

Section: Resultsmentioning

confidence: 99%

“…The below According to this Figure 9 we conclude our system predicts superior results in terms of all performance parameters. The proposed algorithm is compared with the deep CNN (4) and CNN-Ni-LSTM (5) algorithm. We also evaluate some machine learning algorithm such as such as Navie Bayes, Random Forest & Support Vector Machine Here we conclude the performance of the system is better than compared to the existing system.…”

Section: Resultsmentioning

confidence: 99%

“…It also produces a higher error rate on a heterogeneous image dataset. Borth et al (5) published a dataset that includes more than 3000 associate noun pairs to assist researchers in contributing to the field. Their research also includes a collection of baseline models, which are typically applied in the process of benchmarking approaches that are based on associate noun pairs.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Context Aware Image Sentiment Classification using Deep Learning Techniques

Agarwal¹,

Gupta²

2022

IJST

View full text Add to dashboard Cite

Objectives: To propose context aware sentiment classification using deep learning techniques. Methods: We used EfficientNetB-7 deep learning framework for caption generation for the input image and to classify the sentiment of generated caption using machine learning techniques. First, we employ several real-time and synthetic image datasets, then apply pre-processing and normalization for data balancing. Then efficient module implementation for feature extraction and selection using convolutional and pooling layers were done. Despite this proceeding, it generates the caption for respective images. The various feature extraction and selection Natural Language Processing (NLP) techniques such as TF-IDF, lemmas, dependency and correlational features have been used and classify the sentiment label using attention model and greedy approach. Finally, generating the blue score for the entire testing dataset and show the effectiveness of the proposed system. Findings: Our model gives higher accuracy with different deep learning techniques which is demonstrated in result section. The proposed model archives 73.80% average accuracy for EMOTIC dataset. The module has evaluated with different features and deep learning classification algorithms proposed earlier. Novelty: This research is the collaboration of Deep learning and machine learning classification techniques. We first extract the visual features from the input image using deep learning and classify with machine learning with the collaboration of NLP processes. We also carried out various feature extraction techniques such as Ngram, dependency features, co-relational features and determined the sentiment of generated captions.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Context Aware Image Sentiment Classification using Deep Learning Techniques

Agarwal¹,

Gupta²

2022

IJST

View full text Add to dashboard Cite

show abstract

“…Pretrained CNN models are used for the encoder while long short-term memory (LSTM) or gated recurrent unit (GRU) neural networks are commonly used for the language generation. However, the encoder-decoder method is limited in its ability to preserve all source information in the fxedlength vector, and the unidirectional LSTM decoder only preserves past information which leads to poor outcomes for long sequential data [9].…”

Section: Introductionmentioning

confidence: 99%

Amharic Language Image Captions Generation Using Hybridized Attention-Based Deep Neural Networks

Solomon

Abebe

2023

Applied Computational Intelligence and Soft Computing

View full text Add to dashboard Cite

This study aims to develop a hybridized deep learning model for generating semantically meaningful image captions in Amharic Language. Image captioning is a task that combines both computer vision and natural language processing (NLP) domains. However, existing studies in the English language primarily focus on visual features to generate captions, resulting in a gap between visual and textual features and inadequate semantic representation. To address this challenge, this study proposes a hybridized attention-based deep neural network (DNN) model. The model consists of an Inception-v3 convolutional neural network (CNN) encoder to extract image features, a visual attention mechanism to capture significant features, and a bidirectional gated recurrent unit (Bi-GRU) with attention decoder to generate the image captions. The model was trained on the Flickr8k and BNATURE datasets with English captions, which were translated into Amharic Language with the help of Google Translator and Amharic Language experts. The evaluation of the model showed improvement in its performance, with a 1G-BLEU score of 60.6, a 2G-BLEU score of 50.1, a 3G-BLEU score of 43.7, and a 4G-BLEU score of 38.8. Generally, this study highlights the effectiveness of the hybrid approach in generating Amharic Language image captions with better semantic meaning.

show abstract