Modern smartphones can provide a multitude of services to assist people with visual impairments, and their cameras in particular can be useful for assisting with tasks, such as reading signs or searching for objects in unknown environments. Previous research has looked at ways to solve these problems by processing the camera's video feed, but very little work has been done in actively guiding the user towards specific points of interest, maximising the effectiveness of the underlying visual algorithms. In this paper, we propose a control algorithm based on a Markov Decision Process that uses a smartphone's camera to generate realtime instructions to guide a user towards a target object. The solution is part of a more general active vision application for people with visual impairments. An initial implementation of the system on a smartphone was experimentally evaluated with participants with healthy eyesight to determine the performance of the control algorithm. The results show the effectiveness of our solution and its potential application to help people with visual impairments find objects in unknown environments. • a set of user experiments that prove the effectiveness of our active search implementation.
Object detection plays a crucial role in the development of Electronic Travel Aids (ETAs), capable to guide a person with visual impairments towards a target object in an unknown indoor environment. In such a scenario, the object detector runs on a mobile device (e.g. smartphone) and needs to be fast, accurate, and, most importantly, lightweight. Nowadays, Deep Neural Networks (DNN) have become the state-of-the-art solution for object detection tasks, with many works improving speed and accuracy by proposing new architectures or extending existing ones. A common strategy is to use deeper networks to get higher performance, but that leads to a higher computational cost which makes it impractical to integrate them on mobile devices with limited computational power. In this work we compare different object detectors to find a suitable candidate to be implemented on ETAs, focusing on lightweight models capable of working in real-time on mobile devices with a good accuracy. In particular, we select two models: SSD Lite with Mobilenet V2 and Tiny-DSOD. Both models have been tested on the popular OpenImage dataset and a new dataset, named L-CAS Office dataset, collected to further test models' performance and robustness in a real scenario inspired by the actual perception challenges of a user with visual impairments.
Terrain mapping has a many use cases in both land surveyance and autonomous vehicles. Popular methods generate occupancy maps over 3D space, which are sub-optimal in outdoor scenarios with large, clear spaces where gaps in LiDAR readings are common. A terrain can instead be modelled as a height map over 2D space which can iteratively be updated with incoming LiDAR data, which simplifies computation and allows missing points to be estimated based on the current terrain estimate. The latter point is of particular interest, since it can reduce the data collection effort required (and its associated costs) and current options are not suitable to real-time operation. In this work, we introduce a new method that is capable of performing such terrain mapping and inferencing tasks in real-time. We evaluate it with a set of mapping scenarios and show it is capable of generating maps with higher accuracy than an OctoMap-based method.
Abstract. The Solar Thermal Energy Research Group (STERG) is investigating ways to make heliostats cheaper to reduce the total cost of a concentrating solar power (CSP) plant. One avenue of research is to use unmanned aerial vehicles (UAVs) to automate and assist with the heliostat calibration process. To do this, the pose estimation error of each UAV must be determined and integrated into a calibration procedure. A computer vision (CV) system is used to measure the pose of a quadcopter UAV. However, this CV system contains considerable measurement errors. Since this is a high-dimensional problem, a sophisticated prediction model must be used to estimate the measurement error of the CV system for any given pose measurement vector. This paper attempts to train and validate such a model with the aim of using it to determine the pose error of a quadcopter in a CSP plant setting.
Sound perception is a fundamental skill for many people with severe sight impairments. The research presented in this paper is part of an ongoing project with the aim to create a mobile guidance aid to help people with vision impairments find objects within an unknown indoor environment. This system requires an effective non-visual interface and uses bone-conduction headphones to transmit audio instructions to the user. It has been implemented and tested with spatialised audio cues, which convey the direction of a predefined target in 3D space. We present an in-depth evaluation of the audio interface with several experiments that involve a large number of participants, both blindfolded and with actual visual impairments, and analyse the pros and cons of our design choices. In addition to producing results comparable to the state-of-the-art, we found that Fitts's Law (a predictive model for human movement) provides a suitable a metric that can be used to improve and refine the quality of the audio interface in future mobile navigation aids. CCS Concepts: • Human-centered computing → HCI theory, concepts and models; Laboratory experiments; Sound-based input / output; Pointing devices; Empirical studies in accessibility.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.