We are interested in tracking changes in large-scale data by periodically creating an agglomerative clustering and examining the evolution of clusters (communities) over time. We examine a large real-world data set: the NEC CiteSeer database, a linked network of >250,000 papers. Tracking changes over time requires a clustering algorithm that produces clusters stable under small perturbations of the input data. However, small perturbations of the CiteSeer data lead to significant changes to most of the clusters. One reason for this is that the order in which papers within communities are combined is somewhat arbitrary. However, certain subsets of papers, called natural communities, correspond to real structure in the CiteSeer database and thus appear in any clustering. By identifying the subset of clusters that remain stable under multiple clustering runs, we get the set of natural communities that we can track over time. We demonstrate that such natural communities allow us to identify emerging communities and track temporal changes in the underlying structure of our network data. E mergent properties of large linked networks have recently become the focus of intense study. This research is driven by the increasing complexity and importance of large networks, such as the World Wide Web, the electricity grid, and large social networks that capture relationships between individuals. Realworld networks generally exhibit properties that lie somewhere in-between those of highly structured networks and purely random ones (1-4). So far, most research has focused on using static properties, such as the connectivity of the nodes in the network and the average distance between two nodes, to explain the complex structure. However, these networks generally evolve over time and so temporal characteristics are a key source of interest. Our goal in this paper is to provide techniques for the study of the evolution of large linked networks.In our approach, we use agglomerative clusterings of the linked network. By clustering the network at different points in time, we study its temporal evolution. This approach places a new burden on the underlying clustering method. Clustering methods can be surprisingly sensitive to minor changes of the input data. For obtaining a static view of the higher-level structure of the data, such instabilities may be acceptable because the resulting hierarchy often already reveals interesting structure. However, in tracking changes over time, we need to be able to find corresponding communities in clusterings taken from the data at different points in time. If the clusterings are very sensitive to small perturbations of the input data, distinguishing between ''real'' changes versus ''accidental'' changes in the higher-level structure becomes difficult, if not impossible. In the clusterings of our linked network data, we found there are a large number of relatively random clusters that do not correspond to real community structures. These random clusters obscure the real temporal changes. Fortunately...
We are interested in finding natural communities in largescale linked networks. Our ultimate goal is to track changes over time in such communities. For such temporal tracking, we require a clustering algorithm that is relatively stable under small perturbations of the input data. We have developed an efficient, scalable agglomerative strategy and applied it to the citation graph of the NEC CiteSeer database (250,000 papers; 4.5 million citations). Agglomerative clustering techniques are known to be unstable on data in which the community structure is not strong. We find that some communities are essentially random and thus unstable while others are natural and will appear in most clusterings. These natural communities will enable us to track the evolution of communities over time.
You will be contacted as soon as possible.
In a society dominated by mobile phones and still increasing media collections, Interactive Learning is slowly becoming the favored paradigm for managing these collections. Still, however, no scaling Interactive Learning system exists on a mobile phone. In this paper, we present XQC, an Interactive Learning platform with a user interface that fits most modern smartphones, and scales to large media collections. CCS CONCEPTS• Information systems → Multimedia and multimodal retrieval; Search interfaces; Retrieval on mobile devices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.