2020
DOI: 10.1109/access.2020.2984718
|View full text |Cite
|
Sign up to set email alerts
|

Unifying Visual Localization and Scene Recognition for People With Visual Impairment

Abstract: With the development of computer vision and mobile computing, assistive navigation for people with visual impairment arouses the attention of research communities. As two key challenges of assistive navigation, ''Where am I?'' and ''What are the surroundings?'' are still to be resolved by taking advantage of visual information. In this paper, we leverage the prevailing compact network as the backbone to build a unified network featuring two branches that implement scene description and scene recognition separa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(14 citation statements)
references
References 42 publications
0
10
0
Order By: Relevance
“…This solution implicitly fuses geometric and semantic information in the features extracted for place recognition. Another example of multitask architecture, in the domain of assistive technologies, is demonstrated in [2], which introduces a model with a single backbone and two heads, one for VPR and the other for scene recognition.…”
Section: Multi-task Architecturesmentioning
confidence: 99%
See 1 more Smart Citation
“…This solution implicitly fuses geometric and semantic information in the features extracted for place recognition. Another example of multitask architecture, in the domain of assistive technologies, is demonstrated in [2], which introduces a model with a single backbone and two heads, one for VPR and the other for scene recognition.…”
Section: Multi-task Architecturesmentioning
confidence: 99%
“…Pertaining to the last point, recognizing places by vision is regarded as a key component for localization and navigation, being used for loop-closure in SLAM algorithms in GPS denied environments as well as an input to learn navigation policies [1] under different conditions. Remarkably, the development of visual localization in robotics is also paving the way for new applications of VPR, such as assistive technologies for people with visual impairments [2].…”
Section: Introductionmentioning
confidence: 99%
“…They employed an external GPS tracker to define the location of the user using a u-blox NEO-6M chip with a location accuracy of less than 0.4 m. A second approach is image-based positioning systems. This approach defines a location of a user by querying a captured image in a dataset that contains images and location information [ 36 , 37 , 38 , 58 ]. V-Eye [ 39 ] used visual simultaneous localization and mapping (SLAM) and model-based localization (MBL) to localize the BVIP with a median error of approximately 0.27 m.…”
Section: Journey Planningmentioning
confidence: 99%
“…There are several research approaches used to help BVIP to interpret their immediate environments, such as scene recognition [ 58 ], multi-object detection [ 42 ], and scene caption [ 43 ]. Scene recognition is about classifying the image into pre-defined classe [ 58 ], while multi-object detection is to detect multiple objects on a single image [ 42 ]. Scene caption is considered the most suitable in this case, as it describes objects in context (environment) and their relation in sentence [ 129 ].…”
Section: Real-time Navigationmentioning
confidence: 99%
“…First, candidate region-based object detection methods, such as Hybrid Task Cascade [13], CenterMask [14], PolyTransform [15], etc; second, regression-based object detection methods, such as YOLO [16,17], SSD [18], FPN [19], etc; third, search-based object detection methods, such as AttentionNet [20] and reinforcement learning-based object detection algorithms [21]. Many scholars have incorporated deep learning into the technical solutions for indoor positioning and navigation: A fingerprint localization algorithm based on Deep Belief Networks (DBN) with noise reduction is used to achieve target localization in specific indoor environments [22]; using deep learning methods to automatically encode and extract deep features from Wi-Fi fingerprint data, and create a deep feature location fingerprint database with one-to-many relationships for indoor localization [23]; adding the scene recognition classification process to a visual localization system [24], etc. At present, the image quality, pixel resolution, sensor, and aperture performance of the video frames obtained by the cell phone camera have been significantly improved.…”
Section: Introductionmentioning
confidence: 99%