Corpus Construction and Semantic Analysis of Indonesian Image Description

Nur’Aini, Khoirun Nisa; Effendi, Johanes; Sakti, Sakriani; Adriani, Mirna; Nakamura, Satoshi

doi:10.21437/sltu.2018-9

Cited by 1 publication

(1 citation statement)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…x i−1 ; I) [26][27][28][29][30][31][32][33][34][35][36]. Model image captioning has been researched by many using CNN block and language models, such as DenseNet and LSTM [9,17], CNN and LSTM [19,26,33,[37][38][39], inceptionV3 and RNN [14], and CNN and BERT [40,41]. One of the important parts of captioning is word embedding, which provides a vector feature value for each word.…”

Section: Introductionmentioning

confidence: 99%

Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images

2022

View full text Add to dashboard Cite

Captioning is the process of assembling a description for an image. Previous research on captioning has usually focused on foreground objects. In captioning concepts, there are two main objects for discussion: background object and foreground object. In contrast to the previous image-captioning research, generating captions from the geological images of rocks is more focused on the background of the images. This study proposed image captioning using a convolutional neural network, long short-term memory, and word2vec to generate words from the image. The proposed model was constructed by a convolutional neural network (CNN), long short-term memory (LSTM), and word2vec and gave a dense output of 256 units. To make it properly grammatical, a sequence of predicted words was reconstructed into a sentence by the beam search algorithm with K = 3. An evaluation of the pre-trained baseline model VGG16 and our proposed CNN-A, CNN-B, CNN-C, and CNN-D models used BLEU score methods for the N-gram. The BLEU scores achieved for BLEU-1 using these models were 0.5515, 0.6463, 0.7012, 0.7620, and 0.5620, respectively. BLEU-2 showed scores of 0.6048, 0.6507, 0.7083, 0.8756, and 0.6578, respectively. BLEU-3 performed with scores of 0.6414, 0.6892, 0.7312, 0.8861, and 0.7307, respectively. Finally, BLEU-4 had scores of 0.6526, 0.6504, 0.7345, 0.8250, and 0.7537, respectively. Our CNN-C model outperformed the other models, especially the baseline model. Furthermore, there are several future challenges in studying captions, such as geological sentence structure, geological sentence phrase, and constructing words by a geological tagger.

show abstract