Abstract:A major problem in task-oriented conversational agents is the lack of support for the repair of conversational breakdowns. Prior studies have shown that current repair strategies for these kinds of errors are often ineffective due to: (1) the lack of transparency about the state of the system's understanding of the user's utterance; and (2) the system's limited capabilities to understand the user's verbal attempts to repair natural language understanding errors. This paper introduces SOVITE, a new multi-modal … Show more
“…Recently Li et. al [37] explored multi-modal strategies in the context of the existing mobile app Graphical User Interface (GUIs) for fixing Natural Language Understanding (NLU) breakdowns and command disambiguations [36]. In particular, one of their system solution (Figure 2.1.a)…”
Section: User Perceptions Of Task-oriented Chatbotsmentioning
confidence: 99%
“…We hypothesize that explaining the competencies and limitations of the chatbot using the identified intent and entity will not only aid users to recognize the breakdown but also improve transparency. Furthermore, within the breakdown decision, an indication of where the problem occurred and its possible causes would help the users more clearly understand the cause of the breakdown and repair their queries [37].…”
Section: Structuring the Explanation With Intent And Entitymentioning
confidence: 99%
“…Consequently, inspired by the "research through design" [67] paradigm, we contributed in developing a minimal viable implementation of a chatbot that allowed us to explore how visual explanations should be designed. Through our implementation we were able to augment the chatbot functionality with different visual in-context explanations that could support a range of infeasible and disambiguation tasks in the user study and assess them with users, however, in the future, there is still more work that needs to be done at the intersection of ML and HCI to build upon recent work [37]. Nevertheless, our designs could be used to map user intents to specific portions of GUIs and interaction examples from other users and therefore could be adapted to other feature-rich applications besides spreadsheets that have similar UIs and menu structures.…”
Section: Future Work: Designing a Hybrid Of Visual Tour And Nontour Modementioning
confidence: 99%
“…However, complexities of natural language interactions [3, 51] and limited training sets and poor conversational understanding [2] remain to be key obstacles in fully realizing the potential of human-chatbot interaction. For example, when interacting with task-oriented chatbots, a key challenge for users is dealing with conversational dead-ends or breakdowns [5,37,36]. In fact, during a conversational breakdown, as many as 70% of users may opt to quit the task or completely abandon the chatbot, while others may try to rephrase their queries with little or no success [51].…”
“…Recently Li et. al [37] explored multi-modal strategies in the context of the existing mobile app Graphical User Interface (GUIs) for fixing Natural Language Understanding (NLU) breakdowns and command disambiguations [36]. In particular, one of their system solution (Figure 2.1.a)…”
Section: User Perceptions Of Task-oriented Chatbotsmentioning
confidence: 99%
“…We hypothesize that explaining the competencies and limitations of the chatbot using the identified intent and entity will not only aid users to recognize the breakdown but also improve transparency. Furthermore, within the breakdown decision, an indication of where the problem occurred and its possible causes would help the users more clearly understand the cause of the breakdown and repair their queries [37].…”
Section: Structuring the Explanation With Intent And Entitymentioning
confidence: 99%
“…Consequently, inspired by the "research through design" [67] paradigm, we contributed in developing a minimal viable implementation of a chatbot that allowed us to explore how visual explanations should be designed. Through our implementation we were able to augment the chatbot functionality with different visual in-context explanations that could support a range of infeasible and disambiguation tasks in the user study and assess them with users, however, in the future, there is still more work that needs to be done at the intersection of ML and HCI to build upon recent work [37]. Nevertheless, our designs could be used to map user intents to specific portions of GUIs and interaction examples from other users and therefore could be adapted to other feature-rich applications besides spreadsheets that have similar UIs and menu structures.…”
Section: Future Work: Designing a Hybrid Of Visual Tour And Nontour Modementioning
confidence: 99%
“…However, complexities of natural language interactions [3, 51] and limited training sets and poor conversational understanding [2] remain to be key obstacles in fully realizing the potential of human-chatbot interaction. For example, when interacting with task-oriented chatbots, a key challenge for users is dealing with conversational dead-ends or breakdowns [5,37,36]. In fact, during a conversational breakdown, as many as 70% of users may opt to quit the task or completely abandon the chatbot, while others may try to rephrase their queries with little or no success [51].…”
“…The combination of voice and touch enhanced the experience on the mobile devices. Besides, multimodal method were also implemented to enhance the performance on disambiguation interfaces [ 32 , 35 , 45 ].…”
Editing operations such as cut, copy, paste, and correcting errors in typed text are often tedious and challenging to perform on smartphones. In this paper, we present VT, a voice and touch-based multi-modal text editing and correction method for smartphones. To edit text with VT, the user glides over a text fragment with a finger and dictates a command, such as “bold” to change the format of the fragment, or the user can tap inside a text area and speak a command such as “highlight this paragraph” to edit the text. For text correcting, the user taps approximately at the area of erroneous text fragment and dictates the new content for substitution or insertion. VT combines touch and voice inputs with language context such as language model and phrase similarity to infer a user’s editing intention, which can handle ambiguities and noisy input signals. It is a great advantage over the existing error correction methods (e.g., iOS’s Voice Control) which require precise cursor control or text selection. Our evaluation shows that VT significantly improves the efficiency of text editing and text correcting on smartphones over the touch-only method and the iOS’s Voice Control method. Our user studies showed that VT reduced the text editing time by 30.80%, and text correcting time by 29.97% over the touch-only method. VT reduced the text editing time by 30.81%, and text correcting time by 47.96% over the iOS’s Voice Control method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.