Unifying Visual Localization and Scene Recognition for People With Visual Impairment

Cheng, Ruiqi; Wang, Kaiwei; Bai, Jian; Xu, Zhijie

doi:10.1109/access.2020.2984718

Cited by 21 publications

(14 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This solution implicitly fuses geometric and semantic information in the features extracted for place recognition. Another example of multitask architecture, in the domain of assistive technologies, is demonstrated in [2], which introduces a model with a single backbone and two heads, one for VPR and the other for scene recognition.…”

Section: Multi-task Architecturesmentioning

confidence: 99%

“…Pertaining to the last point, recognizing places by vision is regarded as a key component for localization and navigation, being used for loop-closure in SLAM algorithms in GPS denied environments as well as an input to learn navigation policies [1] under different conditions. Remarkably, the development of visual localization in robotics is also paving the way for new applications of VPR, such as assistive technologies for people with visual impairments [2].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Survey on Deep Visual Place Recognition

2021

View full text Add to dashboard Cite

In recent years visual place recognition (VPR), i.e., the problem of recognizing the location of images, has received considerable attention from multiple research communities, spanning from computer vision to robotics and even machine learning. This interest is fueled on one hand by the relevance that visual place recognition holds for many applications and on the other hand by the unsolved challenge of making these methods perform reliably in different conditions and environments. This paper presents a survey of the state-of-the-art of research on visual place recognition, focusing on how it has been shaped by the recent advances in deep learning. We start discussing the image representations used in this task and how they have evolved from using hand-crafted to deep-learned features. We further review how metric learning techniques are used to get more discriminative representations, as well as techniques for dealing with occlusions, distractors, and shifts in the visual domain of the images. The survey also provides an overview of the specific solutions that have been proposed for applications in robotics and with aerial imagery. Finally the survey provides a summary of datasets that are used in visual place recognition, highlighting their different characteristics. INDEX TERMS Visual place recognition, image representation learning, deep learning.

show abstract

Section: Multi-task Architecturesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A Survey on Deep Visual Place Recognition

2021

View full text Add to dashboard Cite

show abstract

“…They employed an external GPS tracker to define the location of the user using a u-blox NEO-6M chip with a location accuracy of less than 0.4 m. A second approach is image-based positioning systems. This approach defines a location of a user by querying a captured image in a dataset that contains images and location information [ 36 , 37 , 38 , 58 ]. V-Eye [ 39 ] used visual simultaneous localization and mapping (SLAM) and model-based localization (MBL) to localize the BVIP with a median error of approximately 0.27 m.…”

Section: Journey Planningmentioning

confidence: 99%

“…There are several research approaches used to help BVIP to interpret their immediate environments, such as scene recognition [ 58 ], multi-object detection [ 42 ], and scene caption [ 43 ]. Scene recognition is about classifying the image into pre-defined classe [ 58 ], while multi-object detection is to detect multiple objects on a single image [ 42 ]. Scene caption is considered the most suitable in this case, as it describes objects in context (environment) and their relation in sentence [ 129 ].…”

Section: Real-time Navigationmentioning

confidence: 99%

A Systematic Review of Urban Navigation Systems for Visually Impaired People

El-taher

Taha

Courtney

et al. 2021

Sensors

View full text Add to dashboard Cite

Blind and Visually impaired people (BVIP) face a range of practical difficulties when undertaking outdoor journeys as pedestrians. Over the past decade, a variety of assistive devices have been researched and developed to help BVIP navigate more safely and independently. In addition, research in overlapping domains are addressing the problem of automatic environment interpretation using computer vision and machine learning, particularly deep learning, approaches. Our aim in this article is to present a comprehensive review of research directly in, or relevant to, assistive outdoor navigation for BVIP. We breakdown the navigation area into a series of navigation phases and tasks. We then use this structure for our systematic review of research, analysing articles, methods, datasets and current limitations by task. We also provide an overview of commercial and non-commercial navigation applications targeted at BVIP. Our review contributes to the body of knowledge by providing a comprehensive, structured analysis of work in the domain, including the state of the art, and guidance on future directions. It will support both researchers and other stakeholders in the domain to establish an informed view of research progress.

show abstract

“…First, candidate region-based object detection methods, such as Hybrid Task Cascade [13], CenterMask [14], PolyTransform [15], etc; second, regression-based object detection methods, such as YOLO [16,17], SSD [18], FPN [19], etc; third, search-based object detection methods, such as AttentionNet [20] and reinforcement learning-based object detection algorithms [21]. Many scholars have incorporated deep learning into the technical solutions for indoor positioning and navigation: A fingerprint localization algorithm based on Deep Belief Networks (DBN) with noise reduction is used to achieve target localization in specific indoor environments [22]; using deep learning methods to automatically encode and extract deep features from Wi-Fi fingerprint data, and create a deep feature location fingerprint database with one-to-many relationships for indoor localization [23]; adding the scene recognition classification process to a visual localization system [24], etc. At present, the image quality, pixel resolution, sensor, and aperture performance of the video frames obtained by the cell phone camera have been significantly improved.…”

Section: Introductionmentioning

confidence: 99%

Mobile Phone Indoor Scene Recognition Location Method Based on Semantic Constraint of Building Map

Liu¹,

Gao²,

Luo³

et al. 2022

Preprint

View full text Add to dashboard Cite

At present, indoor localization is one of the core technologies of location-based services (LBS), and there exist numerous scenario-oriented application solutions. Visual features, as the main semantic information to help people understand the environment and thus occupy the dominant part, many techniques about indoor scene recognition are widely adopted. However, the engineering application problem of cell phone indoor scene recognition and localization has not been well solved due to insufficient semantic constraint information of building map and the immaturity of building map location anchors (MLA) matching positioning technology. To address the above problems, this paper proposes a cell phone indoor scene recognition and localization method with building map semantic constraints. Firstly, we build a library of geocoded entities for building map location anchors (MLA), which can provide users with "immersive" real-world building maps on the one hand and semantic anchor point constraints for cell phone positioning on the other. Secondly, using the improved YOLOv5s deep learning model carried on the mobile terminal, we recognize the universal map location anchors (MLA) elements in building scenes by cell phone camera video in real-time. Lastly, the spatial location of the scene elements obtained from the cell phone video recognition is matched with the building MLA to achieve real-time positioning and navigation. The experimental results show that the model recognition accuracy of this method is above 97.2%, and the maximum localization error is within the range of 0.775 m, and minimized to 0.5 m after applying the BIMPN road network walking node constraint, which can effectively achieve high positioning accuracy in the building scenes with rich MLA element information. In addition, the building map location anchors (MLA) has universal characteristics, and the positioning algorithm based on scene element recognition is compatible with the extension of indoor map data types, so this method has good prospects for engineering applications.

show abstract

Unifying Visual Localization and Scene Recognition for People With Visual Impairment

Cited by 21 publications

References 42 publications

A Survey on Deep Visual Place Recognition

A Survey on Deep Visual Place Recognition

A Systematic Review of Urban Navigation Systems for Visually Impaired People

Mobile Phone Indoor Scene Recognition Location Method Based on Semantic Constraint of Building Map

Contact Info

Product

Resources

About