Text streams demand an effective, interactive, and on-the-fly method to explore the dynamic and massive data sets, and meanwhile extract valuable information for visual analysis. In this paper, we propose such an interactive visualization system that enables users to explore streaming-in text documents without prior knowledge of the data. The system can constantly incorporate incoming documents from a continuous source into existing visualization context, which is "physically" achieved by minimizing a potential energy defined from similarities between documents. Unlike most existing methods, our system uses dynamic keyword vectors to incorporate newly-introduced keywords from data streams. Furthermore, we propose a special keyword importance that makes it possible for users to adjust the similarity on-the-fly, and hence achieve their preferred visual effects in accordance to varying interests, which also helps to identify hot spots and outliers. We optimize the system performance through a similarity grid and with parallel implementation on graphics hardware (GPU), which achieves instantaneous animated visualization even for a very large data collection. Moreover, our system implements a powerful user interface enabling various user interactions for in-depth data analysis. Experiments and case studies are presented to illustrate our dynamic system for text stream exploration.
An interactive visualization system, STREAMIT, enables users to explore text streams on-the-fly without prior knowledge of the data. It incorporates incoming documents from a continuous source into existing visualization context with automatic grouping and separation based on document similarities. STREAMIT supports interactive exploration with good scalability: First, keyword importance is adjustable on-the-fly for preferred clustering effects from varying interests. Second, topic modeling is used to represent the documents with higher level semantic meanings. Third, document clusters are generated to promote better understanding. The system performance is optimized to achieve instantaneous animated visualization even for a very large data collection. STREAMIT provides a powerful user interface for in-depth data analysis. Case studies are presented to demonstrate the effectiveness of STREAMIT.
Many text collections with temporal references, such as news corpora and weblogs, are generated to report and discuss real life events. Thus, event-related tasks, such as detecting real life events that drive the generation of the text documents, tracking event evolutions, and investigating reports and commentaries about events of interest, are important when exploring such text collections. To incorporate and leverage human efforts in conducting such tasks, we propose a novel visual analytics approach named EventRiver. EventRiver integrates event-based automated text analysis and visualization to reveal the events motivating the text generation and the long term stories they construct. On the visualization, users can interactively conduct tasks such as event browsing, tracking, association, and investigation. A working prototype of EventRiver has been implemented for exploring news corpora. A set of case studies, experiments, and a preliminary user test have been conducted to evaluate its effectiveness and efficiency.
Abstract-In this paper, we present a novel visual analytics system named Newdle with a focus on exploring large online news collections when the semantics of the individual news articles have already been tagged. Newdle automatically conducts clustering and relation analyses on news articles and builds visualizations and supports interactions upon these analyses. By providing a novel topic overview in which the semantics and temporal features of the significant article clusters in a large collection are intuitively displayed, Newdle allows users to grasp the content of the collection in a glance. Through the rich set of interactions and visualizations provided by Newdle, users can effectively conduct in-depth analyses on topics, tags, and articles of interest. We have implemented a fully working prototype of Newdle, using the online New York Times RSS feeds as its example data input. We present several case studies to illustrate the effectiveness and efficiency of Newdle.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.