A Scene-to-Speech Mobile based Application: Multiple Trained Models Approach

Karkar, AbdelGhani; Kunhoth, Jayakanth; Al‐Maadeed, Somaya

doi:10.1109/iciot48696.2020.9089557

Cited by 4 publications

(3 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Karkar et al [ 105 ] present the concept of scene to speech (STS). STS recognizes the elements in a captured image or a video clip and speaks, loudly, informative textual content that describes the scene.…”

Section: Resultsmentioning

confidence: 99%

Artificial Intelligence of Things Applied to Assistive Technology: A Systematic Literature Review

Freitas

Piai

Farias

et al. 2022

Sensors

View full text Add to dashboard Cite

According to the World Health Organization, about 15% of the world’s population has some form of disability. Assistive Technology, in this context, contributes directly to the overcoming of difficulties encountered by people with disabilities in their daily lives, allowing them to receive education and become part of the labor market and society in a worthy manner. Assistive Technology has made great advances in its integration with Artificial Intelligence of Things (AIoT) devices. AIoT processes and analyzes the large amount of data generated by Internet of Things (IoT) devices and applies Artificial Intelligence models, specifically, machine learning, to discover patterns for generating insights and assisting in decision making. Based on a systematic literature review, this article aims to identify the machine-learning models used across different research on Artificial Intelligence of Things applied to Assistive Technology. The survey of the topics approached in this article also highlights the context of such research, their application, the IoT devices used, and gaps and opportunities for further development. The survey results show that 50% of the analyzed research address visual impairment, and, for this reason, most of the topics cover issues related to computational vision. Portable devices, wearables, and smartphones constitute the majority of IoT devices. Deep neural networks represent 81% of the machine-learning models applied in the reviewed research.

show abstract

Section: Resultsmentioning

confidence: 99%

Artificial Intelligence of Things Applied to Assistive Technology: A Systematic Literature Review

Freitas

Piai

Farias

et al. 2022

Sensors

View full text Add to dashboard Cite

show abstract

“…Regardless these work's importance, they have not focused on mobile devices. In this way, the work [12] created an application responsible for converting video content into audio descriptions, which was implemented on ARM-based processor hardware. The researchers utilized a series of specialized models for fine-grained object classification, each focusing on a specific category.…”

Section: Related Workmentioning

confidence: 99%

A mobile device framework for video captioning using multimodal neural networks

Damaceno,

Cesar Jr.

2023

Anais Estendidos Da XXXVI Conference on Graphics, Patterns and Images (SIBRAPI Estendido 2023)

View full text Add to dashboard Cite

Video captioning is a computer vision task aimed at providing textual descriptions for videos. There are numerous strategies and datasets that can be employed to create models capable of addressing this task. In this study, we have devised a deep learning-based strategy that leverages both audio and image content to generate captions using resource-constrained devices. The datasets utilized include MSR-VTT and TREC-VTT22. We have developed an application tailored for resource-constrained devices that utilizes the optimal model resulting from our training process. Both modalities of data are then combined and processed by the model to generate a comprehensive description related to the captured data. The primary contribution of this work lies in the introduction of an innovative end-to-end application that leverages audio and image data. This application can be utilized on a mobile device to autonomously produce descriptions.

show abstract

“…A mobile device embedded with a smart scanner or camera is assigned to scan or capture the tags for visual marker identification. Non-tag-based systems [7,9,16] do not utilize any visual marker or barcodes. Instead, they process the raw imageries and apply various image feature detection algorithms and machine learning algorithms to recognize the objects.…”

Section: Introductionmentioning

confidence: 99%

Smartphone-based food recognition system using multiple deep CNN models

Fakhrou

Kunhoth

2021

Multimed Tools Appl

Self Cite

View full text Add to dashboard Cite

People with blindness or low vision utilize mobile assistive tools for various applications such as object recognition, text recognition, etc. Most of the available applications are focused on recognizing generic objects. And they have not addressed the recognition of food dishes and fruit varieties. In this paper, we propose a smartphone-based system for recognizing the food dishes as well as fruits for children with visual impairments. The Smartphone application utilizes a trained deep CNN model for recognizing the food item from the real-time images. Furthermore, we develop a new deep convolutional neural network (CNN) model for food recognition using the fusion of two CNN architectures. The new deep CNN model is developed using the ensemble learning approach. The deep CNN food recognition model is trained on a customized food recognition dataset.The customized food recognition dataset consists of 29 varieties of food dishes and fruits. Moreover, we analyze the performance of multiple state of art deep CNN models for food recognition using the transfer learning approach. The ensemble model performed better than state of art CNN models and achieved a food recognition accuracy of 95.55 % in the customized food dataset. In addition to that, the proposed deep CNN model is evaluated in two publicly available food datasets to display its efficacy for food recognition tasks.

show abstract

A Scene-to-Speech Mobile based Application: Multiple Trained Models Approach

Cited by 4 publications

References 29 publications

Artificial Intelligence of Things Applied to Assistive Technology: A Systematic Literature Review

Artificial Intelligence of Things Applied to Assistive Technology: A Systematic Literature Review

A mobile device framework for video captioning using multimodal neural networks

Smartphone-based food recognition system using multiple deep CNN models

Contact Info

Product

Resources

About