Trans4Trans: Efficient Transformer for Transparent Object Segmentation to Help Visually Impaired People Navigate in the Real World

Zhang, Jiaming; Yang, Kailun; Constantinescu, Angela; Peng, Kunyu; Müller, Karin; Stiefelhagen, Rainer

doi:10.1109/iccvw54120.2021.00202

Cited by 65 publications

(23 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In Table 8, we benchmark more than 10 semantic segmentation methods. We compare our models against RGB-only methods covering CNNbased SwiftNet [125], Fast-SCNN [126], CGNet [127], and DeepLabV3+ [128], as well as transformer-based Swin [24], SegFormer [59], and Trans4Trans [3]. We also include multimodal methods, spanning RFNet [1] designed for road-driving scene segmentation and ISSAFE [12], the only known RGB-Event method designed for traffic accident scene segmentation, as well as SA-Gate [8], a state-of-the-art RGB-D segmentation method.…”

Section: Results On Rgb-event Datasetmentioning

confidence: 99%

“…S EMANTIC segmentation is an essential task in computer vision, which aims to transform an image input into its underlying semantically meaningful regions and enables a pixelwise dense scene understanding for many real-world applications such as automated vehicles, robotics navigation, and augmented reality [1], [2], [3]. Over the last years, pixel-wise semantic segmentation of RGB images has gained an increasing amount of attention and made significant progress on segmentation accuracy [4], [5], [6].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers

Liu¹,

Zhang²,

Yang³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

The performance of semantic segmentation of RGB images can be advanced by exploiting informative features from supplementary modalities. In this work, we propose CMX, a vision-transformer-based cross-modal fusion framework for RGB-X semantic segmentation. To generalize to different sensing modalities encompassing various uncertainties, we consider that comprehensive crossmodal interactions should be provided. CMX is built with two streams to extract features from RGB images and the complementary modality (X-modality). In each feature extraction stage, we design a Cross-Modal Feature Rectification Module (CM-FRM) to calibrate the feature of the current modality by combining the feature from the other modality, in spatial-and channel-wise dimensions. With rectified feature pairs, we deploy a Feature Fusion Module (FFM) to mix them for the final semantic prediction. FFM is constructed with a cross-attention mechanism, which enables exchange of long-range contexts, enhancing both modalities' features at a global level. Extensive experiments show that CMX generalizes to diverse multi-modal combinations, achieving state-of-the-art performances on four RGB-Depth benchmarks, as well as RGB-Thermal and RGB-Polarization datasets. Besides, to investigate the generalizability to dense-sparse data fusion, we establish a RGB-Event semantic segmentation benchmark based on the EventScape dataset, on which CMX sets the new state-of-the-art. Code is available at https://github.com/huaaaliu/RGBX Semantic Segmentation

show abstract

Section: Results On Rgb-event Datasetmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers

Liu¹,

Zhang²,

Yang³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Moreover, when the moving object gets closer to the user and its speed is relatively fast, the reminder of potential risk can be passed to user. The object with high velocity is often dangerous for the visually impaired and this velocity information can enhance current obstacle avoidance modules that mainly use the depth information [12,19]. We also designed a questionnaire regarding the expected feedback form from our system.…”

Section: Discussionmentioning

confidence: 99%

“…Besides, visually impaired people find it hard to maintain proper social distances from others during the Covid-19 pandemic [13]. Some assistance systems tackle this issue through Simultaneous Localization And Mapping (SLAM) and deep learning approaches [12,19], to provide accurate guidance to visually impaired people, but they are less effective in highly dynamic scenarios. To address this problem, we propose a system to help people with visual impairments perceive dynamic objects in indoor environments and understand their motion.…”

Section: Introductionmentioning

confidence: 99%

Indoor Navigation Assistance for Visually Impaired People via Dynamic SLAM and Panoptic Segmentation with an RGB-D Sensor

Ou¹,

Zhang²,

Peng³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Exploring an unfamiliar indoor environment and avoiding obstacles is challenging for visually impaired people. Currently, several approaches achieve the avoidance of static obstacles based on the mapping of indoor scenes. To solve the issue of distinguishing dynamic obstacles, we propose an assistive system with an RGB-D sensor to detect dynamic information of a scene. Once the system captures an image, panoptic segmentation is performed to obtain the prior dynamic object information. With sparse feature points extracted from images and the depth information, poses of the user can be estimated. After the egomotion estimation, the dynamic object can be identified and tracked. Then, poses and speed of tracked dynamic objects can be estimated, which are passed to the users through acoustic feedback.

show abstract

“…in adverse conditions [20], [21]. Previous standard UDA and DG approaches are of course inapplicable, as they require access to source data, which is usually unavailable in a highly automated vehicle due to storage limitations.…”

Section: Introductionmentioning

confidence: 99%

Continual BatchNorm Adaptation (CBNA) for Semantic Segmentation

Klingner¹,

Ayache²,

Fingscheidt³

2022

Preprint

View full text Add to dashboard Cite

Environment perception in autonomous driving vehicles often heavily relies on deep neural networks (DNNs), which are subject to domain shifts, leading to a significantly decreased performance during DNN deployment. Usually, this problem is addressed by unsupervised domain adaptation (UDA) approaches trained either simultaneously on source and target domain datasets or even source-free only on target data in an offline fashion. In this work, we further expand a source-free UDA approach to a continual and therefore online-capable UDA on a single-image basis for semantic segmentation. Accordingly, our method only requires the pre-trained model from the supplier (trained in the source domain) and the current (unlabeled target domain) camera image. Our method Continual BatchNorm Adaptation (CBNA) modifies the source domain statistics in the batch normalization layers, using target domain images in an unsupervised fashion, which yields consistent performance improvements during inference. Thereby, in contrast to existing works, our approach can be applied to improve a DNN continuously on a single-image basis during deployment without access to source data, without algorithmic delay, and nearly without computational overhead. We show the consistent effectiveness of our method across a wide variety of source/target domain settings for semantic segmentation. As part of this work, our code will be made publicly available. 1

show abstract

Trans4Trans: Efficient Transformer for Transparent Object Segmentation to Help Visually Impaired People Navigate in the Real World

Cited by 65 publications

References 49 publications

CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers

CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers

Indoor Navigation Assistance for Visually Impaired People via Dynamic SLAM and Panoptic Segmentation with an RGB-D Sensor

Continual BatchNorm Adaptation (CBNA) for Semantic Segmentation

Contact Info

Product

Resources

About