The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2022
DOI: 10.1007/978-3-031-20074-8_18
|View full text |Cite
|
Sign up to set email alerts
|

A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 32 publications
0
3
0
Order By: Relevance
“…In order to help visually impaired individuals understand the meaning of GUI pages and components, researchers have attempted to use computational vision technology for GUI modeling and semantic understanding of GUI pages [16,19,28,41,44,46,51,60,70,131,136,142]. Schoop et al [115] designed a novel system that models the perceived tappability of mobile UI elements with a vision-based deep neural network and helps provide design insights with datasetlevel and instance-level explanations of model predictions.…”
Section: Gui Understanding and Intelligentmentioning
confidence: 99%
“…In order to help visually impaired individuals understand the meaning of GUI pages and components, researchers have attempted to use computational vision technology for GUI modeling and semantic understanding of GUI pages [16,19,28,41,44,46,51,60,70,131,136,142]. Schoop et al [115] designed a novel system that models the perceived tappability of mobile UI elements with a vision-based deep neural network and helps provide design insights with datasetlevel and instance-level explanations of model predictions.…”
Section: Gui Understanding and Intelligentmentioning
confidence: 99%
“…Web Navigation and Question-Answering Web navigation task (Toyama et al 2021;Yao et al 2022;Burns et al 2022) involves developing algorithms or models that enable automated agents to navigate and interact with websites on the Internet. There are some related datasets (Liu et al 2018;Xu et al 2021;Mazumder and Riva 2020;Yao et al 2022;Deng et al 2023;.…”
Section: Related Workmentioning
confidence: 99%
“…Specifically, for Embodied AI datasets, we consider R2R (Anderson et al 2018), REVERIE (Qi et al 2020b) and EQA (Das et al 2018), where the first two are the widely used vision-and-language navigation (VLN) datasets while the last one is a famous embodied question answering dataset. Regarding App-based datasets, we compare Pixel-Help (Li et al 2020), MoTIF (Burns et al 2022) and META-GUI (Sun et al 2022). As for the website, we consider seven datasets for a comprehensive comparison, including MiniWoB++ (Liu et al 2018), RUSS (Xu et al 2021), FLIN (Mazumder and Riva 2020), WebShop (Yao et al 2022), MIND2WEB (Deng et al 2023), WebQA (Chang et al 2022), and ScreenQA (Hsiao et al 2022).…”
Section: Webvln-v1 Dataset Analysis Webvln-v1 Dataset Vs Related Data...mentioning
confidence: 99%
“…MMC4 has not yet been used for pretraining or downstream applications. In mobile apps, the closest domain to webpages, there are two open source datasets that contain all modalities (text, image, and structure): Rico (Deka et al, 2017) and MoTIF (Burns et al, 2022).…”
Section: Related Workmentioning
confidence: 99%