Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey

Sharma, Dhruv; Dhiman, Chhavi; Kumar, Dinesh

doi:10.1016/j.eswa.2023.119773

Cited by 6 publications

(4 citation statements)

References 166 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Equation (11) shows the expression used to evaluate the logarithmic decrement of the discrete signal, while Equation ( 12) describes the relation that expresses the damping ratio ξ 1 in the case of an underdamped vibrating system:…”

Section: Numerical Activitymentioning

confidence: 99%

“…Today, however, most studies present in the literature consist of sloshing applications for vibration mitigation activities in structures [4][5][6]. In this work, the authors defined a simplified multibody model developed through a CFD analysis to determine the optimal operating conditions for serial manipulators used in visual control stations for glass containers [7][8][9][10][11]. The SimScape multibody multidomain simulation environment is increasingly used in complex systems analysis due to its ability to model the different subdomains that characterize real systems in a single environment [12][13][14].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multibody Analysis of Sloshing Effect in a Glass Cylinder Container for Visual Inspection Activities

De Simone,

Veneziano,

Pace

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

This paper addresses the phenomenon of sloshing and the issues that arise during liquid handling at visual inspection stations. The pharmaceutical industry, recently put under pressure by the pandemic, has long adopted modular solutions consisting mainly of robotic islands. This work focuses on a visual inspection island for glass vials and ampules called VRU. This machine uses robotic arms to optimize the inspection process and enables automated control of a wide range of products using image recognition techniques and AI algorithms. However, the handling of containers in the presence of liquids requires special precautions to avoid the occurrence of bubbles inside the fluid that can prevent the cameras from correctly capturing any defects present. The banal solution involves a drastic reduction in the speeds and accelerations to which the liquids are subjected. However, using appropriate techniques makes it possible to achieve performance values similar to those obtainable when manipulating solid materials. The developed algorithms were tested using multibody simulations in the Mathworks Simscape environment and then validated using a six-axis Fanuc robot. In this study, however, the analysis conducted aimed to determine the correlations between trajectories, laws of motion, and sloshing in containers handled at high speed in industrial applications. In this study a multibody model was developed using a CFD analysis. The container consisted of a glass vial for pharmaceutical uses containing a liquid inside. The results obtained from the CFD analysis allowed us to calibrate the multibody model for the next phase of optimization of the laws of motion to be followed by the manipulator.

show abstract

Section: Numerical Activitymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Multibody Analysis of Sloshing Effect in a Glass Cylinder Container for Visual Inspection Activities

De Simone,

Veneziano,

Pace

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…In [9] paper examines workflows, feature representation, visual encoding, language generation models, data-sets, and assessments of deep photo captioning approaches in natural and medical sciences. The author [10] assisting the visually impaired, interacting with robots, and video surveillance systems are just a few examples of the many applications that Automatic Visual Captioning (AVC), a deep-learning technology, is utilized in. [11] This review paper weighs the benefits and drawbacks of Image-Caption Generator, an AI technology that helps people with visual impairments comprehend images and define language.…”

Section: Overview Of An Earlier Review On Image Captioningmentioning

confidence: 99%

A Comprehensive Survey on Image Captioning for Indian Languages: Techniques, Datasets, and Challenges

Jayaswal,

Rani,

Kaur

2023

Preprint

View full text Add to dashboard Cite

In image captioning, we generate visual descriptions from an image. Image Cap-tioning requires identifying the key entity, feature, and association in an image. There is also a requirement to generate captions that are syntactically and semantically correct. The process of image captioning requires computer vision and natural language processing. In the past few decades, a substantial attempt has been made to generate the caption for images. In this survey article, we are going to present an extensive survey on image captioning for Indian Languages. To summarize recent research work in image captioning, first, we briefly review the traditional approach to image captioning depending on template and retrieval. Further deep-learning approaches for image captioning are concentrated which are classified as encoder-decoder architecture, attention-based approach, and transformer architecture. Our main focus in this survey is based on image cap-tioning techniques for Indian languages like Hindi, Bengali Assamese, etc. After that, we analyze the state-of-the-art approach on the most widely dataset i.e. MS COCO dataset with their strengths, limitations, and performance metrics i.e. BLEU, ROUGE, METEOR, CIDEr, SPICE. At last, we explore discussion on open challenges and future direction in the field of image captioning.

show abstract

“…Fig. 1 demonstrates a typical deeplearning-based approach for image-to-text description [5].Concerning the encoder-decoder image captioning approaches, Convolutional Neural Networks (CNNs) have been exploited as encoders for visual feature extraction from the images, and Recurrent Neural Networks (RNNs), "especially LSTM (Long Short-Term Memory) networks" have been exploited as decoders for transforming the obtained features into various natural languages [6,7]. However, encoder-decoder-based approaches are not capable of analyzing the images over time and considering the spatial prospects of images that are pertinent to the image description (alternatively, creating descriptions for the entire scene).…”

Section: Introductionmentioning

confidence: 99%

Image to Text Description Approach based on Deep Learning Models

Hameed Arif

2024

BAJEST

View full text Add to dashboard Cite

The image-to-text description can be indicated by creating captions for images that comply with human language perception. Nowadays, with the speedy progress of deep learning models, image-to-text description (or image captioning) has an expanding consideration by numerous researchers in diverse artificial intelligence relevant applications. In general, accurately getting the semantic information of the principal objects in the images and captioning the association among them represents a crucial issue in this field. In this paper, an image-to-text description approach based on Inception-ResNetV2-LSTM with an attention technique is proposed for effective textual descriptions of images. In this proposed approach, Inception-ResNetV2 is exploited to extract essential features, and the integration of LSTM with the attention technique is implemented as a sentence-creation model in such a way that the learning could be concentrated on specific portions within the images, hence enhancing the performance of image-to-text description approach. In terms of the Meteor and BLEU (1-4) measurements, the proposed approach outperformed other state-of-the-art approaches with 0.787 and (0.977, 0.964, 0.886, and 0.759), respectively

show abstract

Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey

Cited by 6 publications

References 166 publications

Multibody Analysis of Sloshing Effect in a Glass Cylinder Container for Visual Inspection Activities

Multibody Analysis of Sloshing Effect in a Glass Cylinder Container for Visual Inspection Activities

A Comprehensive Survey on Image Captioning for Indian Languages: Techniques, Datasets, and Challenges

Image to Text Description Approach based on Deep Learning Models

Contact Info

Product

Resources

About