Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence 2023
DOI: 10.24963/ijcai.2023/164
|View full text |Cite
|
Sign up to set email alerts
|

A Dual Semantic-Aware Recurrent Global-Adaptive Network for Vision-and-Language Navigation

Abstract: Vision-and-Language Navigation (VLN) is a realistic but challenging task that requires an agent to locate the target region using verbal and visual cues. While significant advancements have been achieved recently, there are still two broad limitations: (1) The explicit information mining for significant guiding semantics concealed in both vision and language is still under-explored; (2) The previously structured map method provides the average historical appearance of visited nodes, while it ignores distinctiv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 11 publications
0
6
0
Order By: Relevance
“…Our model shows superior performance across multiple metrics. Notably, compared with the previous state-of-the-art DSRG [29], CausalVLN gains significant improvements in SR (↑ 2.61%) and RGS (↑ 2.13%) on the validation unseen set. Similar enhancements are also observed on the validation seen and test unseen sets.…”
Section: B Implementation Detailsmentioning
confidence: 86%
See 4 more Smart Citations
“…Our model shows superior performance across multiple metrics. Notably, compared with the previous state-of-the-art DSRG [29], CausalVLN gains significant improvements in SR (↑ 2.61%) and RGS (↑ 2.13%) on the validation unseen set. Similar enhancements are also observed on the validation seen and test unseen sets.…”
Section: B Implementation Detailsmentioning
confidence: 86%
“…Finally, we utilize the memory-augmented global-local crossmodal fusion module from our previous work DSRG [29] to enable the agent to align and leverage features from different modalities, capturing valuable historical cues throughout the navigation.…”
Section: … …mentioning
confidence: 99%
See 3 more Smart Citations