Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1540
|View full text |Cite
|
Sign up to set email alerts
|

Mapping natural language commands to web elements

Abstract: The web provides a rich, open-domain environment with textual, structural, and spatial properties. We propose a new task for grounding language in this environment: given a natural language command (e.g., "click on the second article"), choose the correct element on the web page (e.g., a hyperlink or text box). We collected a dataset of over 50,000 commands that capture various phenomena such as functional references (e.g. "find who made this site"), relational reasoning (e.g. "article by john"), and visual re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 28 publications
(17 citation statements)
references
References 34 publications
(32 reference statements)
0
17
0
Order By: Relevance
“…Sikuli uses screenshots to refer to the GUI elements for automation [34]. Neural networks have been proposed to map high-level verbal descriptions of web elements (the text of the element, its graphical attributes, and its relative position to other elements on the page) to specific graphical elements [23,28]. Recently, we show it is more accurate to use a neural network to first translate the naturallanguage description to a formal semantic representation, which is then used algorithmically to identify the element of interest in the target web page [33].…”
Section: Pbd For Automationmentioning
confidence: 99%
“…Sikuli uses screenshots to refer to the GUI elements for automation [34]. Neural networks have been proposed to map high-level verbal descriptions of web elements (the text of the element, its graphical attributes, and its relative position to other elements on the page) to specific graphical elements [23,28]. Recently, we show it is more accurate to use a neural network to first translate the naturallanguage description to a formal semantic representation, which is then used algorithmically to identify the element of interest in the target web page [33].…”
Section: Pbd For Automationmentioning
confidence: 99%
“…Although the implementation of generating app GUI screenshot confirmations used in SOVITE, as described above, only applies to programming-by-demonstration instructable agents such as SUGILITE [35], PLOW [1], and VASTA [58], there are other feasible approaches for generating app GUI screenshot confirmations in other types of agents. For example, recent advances in machine learning have been shown to support directly matching natural language commands to specific GUI elements [52] and generating semantic labels for GUI elements from screenshots [13]. For agents that use web API calls to fulfill the task intents, it is also feasible to compare the agent API calls to the API calls made by apps by analyzing the code of the apps (e.g., CHABADA [20]), or to the network traffic collected from the apps (e.g., MobiPurpose [28]).…”
Section: Generating the App Gui Screenshot Confirmationsmentioning
confidence: 99%
“…Some works have made early progress in this domain (Liu et al, 2018b;Deka et al, 2016; thanks to the availability of large datasets of GUIs like RICO (Deka et al, 2017). Recent reinforcement learning-based approaches and semantic parsing techniques have also shown promising results in learning models for navigating through GUIs for user-specified task objectives (Liu et al, 2018a;Pasupat et al, 2018). For ITL, an interesting future challenge is to combine these user-independent domain-agnostic machine-learned models with the user's personalized instructions for a specific task.…”
Section: Extracting Task Semantics From Guismentioning
confidence: 99%