Deep image captioning using an ensemble of CNN and LSTM based deep neural networks

Alzubi, Jafar A.; Jain, Rachna; Nagrath, Preeti; Satapathy, Suresh Chandra; Taneja, Soham; Gupta, Paras

doi:10.3233/jifs-189415

Cited by 63 publications

(9 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Deep learning has been established as a powerful tool in many scientific sectors due to its ability to extract high-level features for complex pattern recognition problems. Cultural heritage has also exploited the benefits of deep learning, especially for image and Natural Language Processing (NLP) related applications, such as for automatic image captioning that combines both computer vision and NLP [2]. Preservation and diagnostics of cultural heritage findings, e.g., paintings, sculptures, documents, and artworks, are crucial to determine the historical status of findings and extract the missing knowledge.…”

Section: Related Workmentioning

confidence: 99%

Connecting national flags – a deep learning approach

Kalampokas

Mentizis

Vrochidou

et al. 2023

Multimed Tools Appl

View full text Add to dashboard Cite

National flags are the most recognizable symbols of the identity of a country. Similarities between flags may be observed due to cultural, historical, or ethical connections between nations, because they may be originated from the same group of people, or due to unrelated sharing of common symbols and colors. Although the fact that similar flags exist is indisputable, this has never been quantified. Quantifying flags’ similarities could provide a useful body of knowledge for vexillologists and historians. To this end, this work aims to develop a supporting tool for the scientific study of nations’ history and symbolisms, through the quantification of the varying degrees of similarity between their flags, by considering three initially stated hypotheses and by using a novel feature inclusion (FI) measure. The proposed FI measure aims to objectively quantify the overall similarity between flags based on optical multi-scaled features extracted from flag images. State-of-the-art deep learning models built for other applications tested their capability for the first time for the problem under study by using transfer learning, towards calculating the FI measure. More specifically, FI was quantified by six deep learning models: Yolo (V4 and V5), SSD, RetinaNet, Fast R-CNN, FCOS and CornerNet. Flags’ images dataset included flags of 195 nations officially recognized by the United Nations. Experimental results reported maximum feature inclusion between flags of up to 99%. The extracted degrees of similarity were subsequently justified with the help of the Vexillology scientific domain, to support research findings and to raise questions for further investigation. Experimental results reveal that the proposed approach and FI measure are reliable and able to serve as a supporting tool to social sciences for knowledge extraction and quantification.

show abstract

Section: Related Workmentioning

confidence: 99%

Connecting national flags – a deep learning approach

Kalampokas

Mentizis

Vrochidou

et al. 2023

Multimed Tools Appl

View full text Add to dashboard Cite

show abstract

“…A significant limitation to this work is that L is pre-defined before training, and therefore the model only works on fixed-length CAPTCHA schemes. Combining CNN with LSTM to extract spatial and sequential features is successful in other similar areas like image captioning [ 1 ].…”

Section: Related Workmentioning

confidence: 99%

Multiview deep learning-based attack to break text-CAPTCHAs

Yusuf

Srivastava

Singh

et al. 2022

Int. J. Mach. Learn. & Cyber.

View full text Add to dashboard Cite

Completely Automated Public Turing Test To Tell Computer and Humans Apart (CAPTCHA) is a computer program that prevents malicious computer users. Text-CAPTCHA schemes utilize less-computational costs. Hence, they are the most popularly used. This paper investigates the effectiveness of state-of-the-art (SOTA) text-CAPTCHA schemes, proposes a Multiview deep learning system to break them, and highlights their weaknesses. Rather than the usual single-view feature extraction, the proposed model explores correlational features from multiple views to increase the model’s generalization and classification accuracy. The model combines convolutional neural networks and recurrent networks to preserve the input text-CAPTCHA’s spatial and sequential order. The proposed system has successfully achieved average accuracies ranging from 93.6% to 100%, and the average time to break a text-CAPTCHA scheme ranges from 0.0032 to 0.21 seconds on eight different datasets. Furthermore, an ablation study on 71 human users was conducted to evaluate the effectiveness of the schemes. The results demonstrated that the proposed system effectively outperforms the human users whom the schemes were designed to serve. Lastly, when compared with existing systems, the proposed system outperforms existing SOTA systems with an accuracy gap of almost 40% higher.

show abstract

“…The packet delivery rate of the proposed method was 20% higher than the existing methods, and the packet loss rate of the system was reduced by 15%. Alzubi et al [25] used a deep neural network to integrate and study depth image subtitles. The researchers employed a user-defined integration model composed of an inception model and a two-layer long short-term memory (LSTM) model.…”

Section: Recent Related Workmentioning

confidence: 99%

Artificial Intelligence Algorithms in Ice and Snow Tourism Promotion from Digital Technology

Zhi-liang

Guo

2022

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

The present work is aimed at using the artificial intelligence (AI) algorithm to study the promotion and publicity of ice and snow tourism (IST). Firstly, the urgent needs of IST external publicity are analyzed based on digital technology. Besides, the dynamic vision sensor technology is used to collect data in the distributed Internet of Things structure of the IST publicity system. Then, the AlexNet algorithm is combined with the digital logic method. The research assumption is that the complete IST image based on AI can be formed according to the AlexNet algorithm. The improved AlexNet algorithm and Chi-square test implement the IST poster emotion recognition and IST publicity model. Then, the intelligent customer service of the IST publicity platform is studied. The results demonstrate that after 120 iterations, the accuracy of the sports-oriented publicity method based on AlexNet can reach about 75%. In addition, the accuracy of the recommended IST publicity algorithm based on AlexNet is close to 90% after 80 iterations. Therefore, the model’s accuracy is improved by at least 9.6% compared with the traditional method. The research has practical application value for the digital and intelligent development of the IST industry.

show abstract

Deep image captioning using an ensemble of CNN and LSTM based deep neural networks

Cited by 63 publications

References 3 publications

Connecting national flags – a deep learning approach

Connecting national flags – a deep learning approach

Multiview deep learning-based attack to break text-CAPTCHAs

Artificial Intelligence Algorithms in Ice and Snow Tourism Promotion from Digital Technology

Contact Info

Product

Resources

About