Pre-Trained CNN Architecture Analysis for Transformer-Based Indonesian Image Caption Generation Model

Mulyawan, Rifqi; Sunyoto, Andi; Muhammad, Alva Hendi

doi:10.30630/joiv.7.2.1387

JOIV : Int. J. Inform. Visualization

2023

DOI: 10.30630/joiv.7.2.1387

|View full text |Cite

Pre-Trained CNN Architecture Analysis for Transformer-Based Indonesian Image Caption Generation Model

Rifqi Mulyawan

Andi Sunyoto

Alva Hendi Muhammad

Abstract: Classification and object recognition in image processing has significantly improved computer vision tasks. The method is often used for visual problems, especially in picture classification utilizing the Convolutional Neural Network (CNN). In the popular state-of-the-art (SOTA) task of generating a caption on an image, the implementation is often used for feature extraction of an image as an encoder. Instead of performing direct classification, these extracted features are sent from the encoder to the decoder… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Improving Pre-trained CNN-LSTM Models for Image Captioning with Hyper-Parameter Optimization

Khassaf,

Ali

2024

Eng. Technol. Appl. Sci. Res.

View full text Add to dashboard Cite

The issue of image captioning, which comprises automatic text generation to understand an image’s visual information, has become feasible with the developments in object recognition and image classification. Deep learning has received much interest from the scientific community and can be very useful in real-world applications. The proposed image captioning approach involves the use of Convolution Neural Network (CNN) pre-trained models combined with Long Short Term Memory (LSTM) to generate image captions. The process includes two stages. The first stage entails training the CNN-LSTM models using baseline hyper-parameters and the second stage encompasses training CNN-LSTM models by optimizing and adjusting the hyper-parameters of the previous stage. Improvements include the use of a new activation function, regular parameter tuning, and an improved learning rate in the later stages of training. The experimental results on the flickr8k dataset showed a noticeable and satisfactory improvement in the second stage, where a clear increment was achieved in the evaluation metrics Bleu1-4, Meteor, and Rouge-L. This increment confirmed the effectiveness of the alterations and highlighted the importance of hyper-parameter tuning in improving the performance of CNN-LSTM models in image caption tasks.

show abstract

Improving Pre-trained CNN-LSTM Models for Image Captioning with Hyper-Parameter Optimization

Khassaf,

Ali

2024

Eng. Technol. Appl. Sci. Res.

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Pre-Trained CNN Architecture Analysis for Transformer-Based Indonesian Image Caption Generation Model

Cited by 1 publication

References 29 publications

Improving Pre-trained CNN-LSTM Models for Image Captioning with Hyper-Parameter Optimization

Improving Pre-trained CNN-LSTM Models for Image Captioning with Hyper-Parameter Optimization

Contact Info

Product

Resources

About