2019 Twelfth International Conference on Contemporary Computing (IC3) 2019
DOI: 10.1109/ic3.2019.8844921
|View full text |Cite
|
Sign up to set email alerts
|

Image Captioning using Google's Inception-resnet-v2 and Recurrent Neural Network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 40 publications
(14 citation statements)
references
References 1 publication
0
14
0
Order By: Relevance
“…CNN architectures have been particularly used for image detection, segmentation and classification because images have a special spatial property in their formation, such as edges, textures, gradients, orientation and color [ 15 ]. Many deep learning architectures have been proposed for automatic pattern recognition, such as the Inception-ResNet-v2, Inception-v3, VGG19, ResNet-50, DenseNet-201, Xception and MobileNetV2 architectures, with different performances depending on the characteristics of the data [ 17 , 18 , 19 , 20 , 21 , 22 , 23 ]. These CNN architectures have enabled the development of human-like efficient machines in different domains of application [ 15 ].…”
Section: Introductionmentioning
confidence: 99%
“…CNN architectures have been particularly used for image detection, segmentation and classification because images have a special spatial property in their formation, such as edges, textures, gradients, orientation and color [ 15 ]. Many deep learning architectures have been proposed for automatic pattern recognition, such as the Inception-ResNet-v2, Inception-v3, VGG19, ResNet-50, DenseNet-201, Xception and MobileNetV2 architectures, with different performances depending on the characteristics of the data [ 17 , 18 , 19 , 20 , 21 , 22 , 23 ]. These CNN architectures have enabled the development of human-like efficient machines in different domains of application [ 15 ].…”
Section: Introductionmentioning
confidence: 99%
“…The next stage consists of simultaneously convolving one input using different filter sizes for each convolution and then concatenating them. The next parts of the network repeat 10 or 20 times the inputs and the network uses dropout layers to make the filter values equal to 0 to avoid overfitting [42].…”
Section: Classificationmentioning
confidence: 99%
“…In this study, the convolutional neural network architectures VGG16 [40], VGG19 [41], Inception-ResNetV2 [42], InceptionV3 [43], and DenseNet201 [44] are explored in different experiments to extract the characteristics of the spectral images of coffee fruits in different stages of ripening to determine which of them achieves the best results compared with the traditional classification carried out by experts who evaluate the color tonalities present in the skin of the fruits at the moment of harvesting. For this purpose, 4 experiments were carried out, implementing the techniques of unbalance balancing, subsampling, oversampling, and weighting on the training data.…”
Section: Introductionmentioning
confidence: 99%
“…A subset of the more than a million images in the ImageNet database was used to train this network. The Google Inception CNN model (Bhatia et al, 2019), which was initially created for the ImageNet Recognition Challenge, is now in its third iteration. Using Inception V3, we were able to reduce the output layer's dimensions to one, flatten it, and add a sigmoid layer for classification along with a fully connected layer with 1024 hidden units, Relu activation function, and a dropout rate of 0.4.…”
Section: Inception V3mentioning
confidence: 99%