Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships

Lyu, Yunlian; Shi, Yimin; Zhang, Xianggang

doi:10.1007/s11063-022-10796-8

Cited by 20 publications

(3 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Rather than directly using semantic maps that only contain object classes, some works [4,5,18,22,31,34] consider the relationships among objects for semantic navigation as well. Yang et al [34] propose to use a Graph Convolutional Network (GCN) [14] for incorporating prior knowledge that encodes the object relationship extracted from the Visual Genome [16] dataset into a deep reinforcement learning framework.…”

Section: Relationship Guided Navigationmentioning

confidence: 99%

“…The robot uses the features from the knowledge graph for predicting corresponding actions. Some other works [5,18,22] also use GCNs to encode the relationship information, and the difference between them is the definition of the nodes. Different from the above works, which construct a graph to encode the relationship information, Druon et al [4] use a context grid to represent this kind of semantic information.…”

Section: Relationship Guided Navigationmentioning

confidence: 99%

See 1 more Smart Citation

RSMPNet: Relationship Guided Semantic Map Prediction

Sun,

Wu,

et al. 2024

2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

In semantic navigation, a top-down map with accurate and complete semantic information is vital to subsequent decision-making. However, due to occlusions and limitations of the robot's field of view (FOV), there are often unobserved areas in the top-down maps. To address this problem, recent works have studied semantic map prediction to complete the top-down maps. In this work, we propose to improve map prediction by integrating relational information. We propose RSMPNet, a relationship-guided semantic map prediction network, which makes use of semantic and spatial relationships to predict unobserved areas from accumulated semantic maps. Specifically, we propose a Relationship Reasoning Layer that includes two modules, namely 1) the Semantic Relationship Graph Reasoning Module (SeGRM) to capture the semantic relationship and 2) the Spatial Relationship Graph Reasoning Module (SpGRM) to utilize the spatial relationship. We also design a semantic relationship enhanced loss to enhance our model to learn semantic relationship information. Experiments show the effectiveness of our proposed network which achieves state-of-the-art performance in semantic map prediction. Our code and dataset are publicly available at https://github.com/jws39/semantic-mapprediction

show abstract

Section: Relationship Guided Navigationmentioning

confidence: 99%

Section: Relationship Guided Navigationmentioning

confidence: 99%

RSMPNet: Relationship Guided Semantic Map Prediction

Sun,

Wu,

et al. 2024

2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

show abstract

“…The visual-based DRL has a broad range of applications in robot manipulation tasks as it has little requirement of accurate environment state, such as position and distance, and has achieved good performance in many tasks such as grasping [20,21], pushing [22,23], and navigation [24]. However, it could be really difficult to train an end-to-end visual-based DRL due to the sample complexity and inefficient reward shaping [16].…”

Section: Related Workmentioning

confidence: 99%

Reinforcement Learning with Decoupled State Representation for Robot Manipulations

Dong

Luo

Wang

et al. 2023

Preprint

View full text Add to dashboard Cite

Deep reinforcement learning (DRL) has advanced robot manipulations with an alternative solution to design a control strategy using the raw image as the input directly. Although the image usually comes up with more knowledge about the environment, it needs the policy to achieve representation learning and task learning simultaneously, which is a sample inefficient task. Previous attempts, such as Variational Autoencoder (VAE) based DRL algorithms have attempted to solve this problem by learning a visual representation model, which encodes the entire image into a low-dimension vector. However, since the vector contains both the robot and object information, the coupling within the state is inevitable, which could mislead the training process of DRL policy. In this study, a novel method named Reinforcement Learning with Decoupled State Representation (RLDS) is proposed to decouple the robot and object information to increase the learning efficiency and effectiveness. The experimental results have shown that the proposed method has a faster learning speed and can achieve better performance compared with previous methods in several typical robot tasks. Additionally, with only 3,096 offline images, the proposed method can be successfully applied to a real robot pushing task, which demonstrates its high practicability.

show abstract

Visual Navigation of Target-Driven Memory-Augmented Reinforcement Learning

Li,

Wu,

et al. 2023

Communications in Computer and Information Science

View full text Add to dashboard Cite

Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships

Cited by 20 publications

References 30 publications

RSMPNet: Relationship Guided Semantic Map Prediction

RSMPNet: Relationship Guided Semantic Map Prediction

Reinforcement Learning with Decoupled State Representation for Robot Manipulations

Visual Navigation of Target-Driven Memory-Augmented Reinforcement Learning

Contact Info

Product

Resources

About