A single fault in a telecommunication network frequently results in a number of alarms being reported to the network operator. This multitude of alarms can easily obscure the real cause of the fault. In addition, when multiple faults occur at approximately the same time, it can be difficult to determine how many faults have occurred, thus creating the possibility that some may be missed. A variety of solution approaches have been proposed in the literature, however, practically deployable, commercial solutions remain elusive. The experiences of the Network Fault and Alarm Correlator and Tester (NetFACT) project, carried out at IBM Research and described in this paper, provide some insight as to why this is the case, and what must be done to overcome the barriers encountered. Our observations are based on experimental use of the NetFACT system to process a live, continuous alarm stream from a portion of the Advantis physical backbone network, one of the largest private telecommunications networks in the world.The NetFACT software processes the incoming alarm stream and determines the faults from the alarms. It attempts to narrow down the likely root causes of each fault, to the greatest extent possible, given the available information. To accomplish this, NetFACT employs a novel combination of diagnostic techniques supported by an object-oriented model of the network being managed. This model provides an abstract view of the underlying network of heterogeneous devices. A number of issues were explored in the project including the extensibility of the design to other types of networks, and impact of the practical realities that must be addressed if prototype systems such as NetF ACT are to lead to commercial products.
Information seeking is an important but often difficult task, especially when it involves large and complex data sets. We hypothesize that a context-sensitive interaction paradigm would greatly assist users in their information seeking. Such a paradigm would allow users to both express their requests and receive requested information in context. Driven by this hypothesis, we have taken rigorous steps to design, develop, and evaluate a fullfledged, context-sensitive information system. We started with a Wizard-of-OZ (WOZ) study to verify the effectiveness of our envisioned system. We then built a fully automated system based on the findings from our WOZ study. We targeted the development and integration of two sets of technologies: context-sensitive multimodal input interpretation and multimedia output generation. Finally, we formally evaluated the usability of our system in real world conditions. The results show that our system greatly improves the users' ability to perform practical information-seeking tasks. These results not only confirm our initial hypothesis, but they also indicate the practicality of our approaches.
Multimodal conversation systems allow users to interact with computers effectively using multiple modalities, such as natural language and gesture. However, these systems have not been widely used in practical applications mainly due to their limited input understanding capability. As a result, conversation systems often fail to understand user requests and leave users frustrated. To address this issue, most existing approaches focus on improving a system's interpretation capability. Nonetheless, such improvements may still be limited, since they would never cover the entire range of input expressions. Alternatively, we present a two-way adaptation framework that allows both users and systems to dynamically adapt to each other's capability and needs during the course of interaction. Compared to existing methods, our approach offers two unique contributions. First, it improves the usability and robustness of a conversation system by helping users to dynamically learn the system's capabilities in context. Second, our approach enhances the overall interpretation capability of a conversation system by learning new user expressions on the fly. Our preliminary evaluation shows the promise of this approach.
IntroductionMultimodal interfaces allow human to interact with machines through multiple modalities such as speech, gesture, and gaze. Studies showed that these interfaces support a more effective human-computer interaction, for example, by reducing task completion time and task errors rate [11]. Inspired by the earlier work (e.g., [2,4,8,13]), we are building an intelligent infrastructure, called Responsive Information Architect (RIA), which can engage users in a multimodal conversation. Currently, RIA is embodied in a testbed, called Real Hunter TM , a real-estate application for helping users find residential properties. Figure 1 shows RIA's main components. A user can interact with RIA using multiple input channels, such as speech and gesture. First, a multimodal interpreter exploits various contexts (e.g., conversation history) to produce an interpretation frame that captures the meanings of user inputs. Based on the interpretation frame, a conversation facilitator decides how RIA should act by generating a set of conversation acts (e.g., Describe information to the user). Upon receiving the conversation acts, a presentation broker sketches a presentation draft that expresses the outline of a multimedia presentation. Based on this draft, a language designer and a visual designer work together to author a multimedia blueprint that contains fully coordinated and detailed multimedia presentation. The blueprint is then sent to a producer to be realized. To support all components described above, an information server supplies various contextual information, including domain data (e.g., houses and cities for a real-estate application), a conversation history (e.g., detailed conversation exchanges between RIA and a user), a user model (e.g., user profiles), and an environment model (e.g., device capabilities).Our focus in this paper is on the interpretation of multimodal user inputs. Specifically, we are developing a semantics-based multimodal interpretation framework called MIND (Multimodal Interpreter for Natural Dialog). Most existing works on multimodal interpretation focus on interpreting user inputs through modality integration (e.g., merging speech with gesture) (e.g., [2,4,8]) without considering interaction contexts (although they have been used extensively in spoken dialog systems [1,14]). In a conversation setting, user inputs are often imprecise or abbreviated. Only integrating meanings from individual modalities together sometimes cannot reach a full understanding of those inputs. Therefore, MIND applies a context-based approach that uses a variety of contexts (e.g., domain context and conversation context) to enhance multimodal fusion.Specifically, MIND supports three major processes: unimodal understanding, multimodal understanding, and discourse understanding (Figure 2). First, in unimodal understanding, an array of recognizers (e.g., a speech recognizer) convert input signals (e.g., speech signals) to modality-specific outputs (e.g., text). These outputs are then processed by modality-sp...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.