Deep Learning based Automatic Image Caption Generation

Kesavan, Varsha; Muley, Vaidehi; Kolhekar, Megha

doi:10.1109/gcat47503.2019.8978293

Cited by 21 publications

(5 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…All the models are trained on the same dataset for concrete comparison. [5] Detection and Recognition of Objects in Image Caption Generator System: A Deep Learning Approach, N. K. Kumar, D. Vigneswari, A. Mohan, K. Laxman and J. Yuvaraj. The aim of this paper is to detect, recognize and generate worthwhile captions for a given image using deep learning.…”

Section: Literature Reviewmentioning

confidence: 99%

Image Caption Generator by using CNN and LSTM

-¹

2023

IJFMR

View full text Add to dashboard Cite

In this article, we systematically analyze a deep neural networks-based image caption generation method. Image Captioning aims to automatically generate a sentence description for an image. Our article model will take an image as input and generate on English sentence as output, describing the contents of the image. It has attracted much research attention in cognitive computing in the recent years. The task is rather complex, as the concepts of both computer vision and natural language processing domains are combined together. We have developed a model using the concepts of a Convolutional Neural Network (CNN) and long Short-Term Memory (LSTM) model and build a working model of Image caption generator by implementing CNN and LSTM. After the caption generation phase, we use BLEU Scores to evaluate the efficiency of our model. Thus, our system helps the user to get descriptive caption for the given input image.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Image Caption Generator by using CNN and LSTM

-¹

2023

IJFMR

View full text Add to dashboard Cite

show abstract

“…It alters the aim of forecasting the accurate word towards the aim of creating captions that are the same as the ground truth caption. Kesavan et al [10] systematically analyzed distinct deep DNN-based pre-trained models and image caption generation methods to accomplish the effective models by finetuning. The examined model contains with and without 'attention' concepts for optimizing the caption generation capacity.…”

Section: Literature Reviewmentioning

confidence: 99%

Modeling of Hyperparameter Tuned Deep Learning Model for Automated Image Captioning

et al. 2022

View full text Add to dashboard Cite

Image processing remains a hot research topic among research communities due to its applicability in several areas. An important application of image processing is the automatic image captioning technique, which intends to generate a proper description of an image in a natural language automated. Image captioning is a recently developed hot research topic, and it started to receive significant attention in the field of computer vision and natural language processing (NLP). Since image captioning is considered a challenging task, the recently developed deep learning (DL) models have attained significant performance with increased complexity and computational cost. Keeping these issues in mind, in this paper, a novel hyperparameter tuned DL for automated image captioning (HPTDL-AIC) technique is proposed. The HPTDL-AIC technique encompasses two major parts, namely encoder and decoder. The encoder part utilizes Faster SqueezNet with the RMSProp model to generate an effective depiction of the input image via insertion into a predefined length vector. At the same time, the decoder unit employs a bird swarm algorithm (BSA) with long short-term memory (LSTM) model to concentrate on the generation of description sentences. The design of RMSProp and BSA for the hyperparameter tuning process of the Faster SqueezeNet and LSTM models for image captioning shows the novelty of the work, which helps to accomplish enhanced image captioning performance. The experimental validation of the HPTDL-AIC technique is carried out against two benchmark datasets, and the extensive comparative study pointed out the improved performance of the HPTDL-AIC technique over recent approaches.

show abstract

“…• Pooling layer: Layer is used when the images are too large. Pooling is done to make a small size of an image [3]. It is done on each dimension of depth independently, so the image depth will remain the same.…”

Section: Image Caption Architecturementioning

confidence: 99%

“…By using the hierarchical structure of LSTM, various aspects can be obtained for different levels of information, and attention can be determined based on seen information or details of language [7]. The losses are produced to check accuracy and to understand the learning parameters of networks [3]. The above structure is used for various scene captioning tasks that is captioning of video and image by using various terms of aspect extraction, structure of networks and losses.…”

Section: Hierarchical Structure Of Lstmmentioning

confidence: 99%

“…There are many progresses made in the research field in the past few years such as the identifying of objects from a source image, classification of attributes, classification of the image, and classification of actions by living beings. There are a few steps in the creation of subtitles, like visual pictures are understood from articles, creating relations between objects [3,4], and inscriptions delivering that compare both accurate meaning and detection of each object [5,6]. Second, much more progress has been made in a few years by utilizing attention-dependent framework for caption creation of image and video [7].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Image Caption Generation Using Neural Network Models and LSTM Hierarchical Structure

Waghmare¹,

Shinde²

2021

Computational Intelligence in Pattern Recognition

View full text Add to dashboard Cite

The caption generation is nothing but the generation of textual information from images. For this, objects from images are extracted and classified among predefined classes. The logical objects from the image are extracted and transformed into natural sentences. The recognizing process requires an iterative task that incorporates image recognition as well as machine vision. The process must define relations among objects, persons, and animals and create the textual description of these relations. The paper is about the study of deep learning techniques to discover, identify and produce good captions for a source image. The process of making explanations in the form of sentences for a source image is image captioning, which involves machine vision and natural sentence forming techniques. For these processes, recent models have used deep learning techniques to acquire a great improvement in performance. Second, a more advanced trend is set in utilizing attention-dependent structure for captioning. Recent interpreters use a process of attention for each produced term containing seen term and unseen term. However, these unseen terms are effortlessly detected by considering a model for language in the absence of taking seen indicators, but unseen words could cause and a give bad performance for visual captioning. Taking these issues into consideration, the hierarchy of LSTM [Long-Short-Term Memory] with adaptive attention approach for the creation of captions for images and videos is presented.

show abstract

Deep Learning based Automatic Image Caption Generation

Cited by 21 publications

References 3 publications

Image Caption Generator by using CNN and LSTM

Image Caption Generator by using CNN and LSTM

Modeling of Hyperparameter Tuned Deep Learning Model for Automated Image Captioning

Image Caption Generation Using Neural Network Models and LSTM Hierarchical Structure

Contact Info

Product

Resources

About