“…SOVITE bridges an important gap in these systems, as they focus on the ambiguities, uncer tainties, and vagueness embedded in the user instructions of In terms of the technique used, SOVITE extracts the semantics of app GUIs [13,43] for grounding natural language conver sations. Compared with the previous systems that used the semantics of app GUIs for learning new tasks [36,38,58], extracting task flows [40], and supporting invoking individual GUI widgets with voice commands [64], a new idea in SO VITE is that it encodes app GUIs into the same vector space as natural language utterances, allowing the system to look up semantically relevant task intents when the user refers to apps and app GUI screens in the dialogues for repairing intent detection errors (details in the Implementation section).…”