Abstract-Given a graph, how can we automatically discover roles for nodes? Roles could be, eg., 'bridges', or 'peripherynodes', etc. Roles are compact summaries of a node's behavior that generalize across networks. They enable numerous novel and useful network mining tasks, such as sense-making, searching for similar nodes, and node classification. We propose RolX (Role eXtraction), a scalable (linear in the number of edges), unsupervised learning approach for automatically extracting roles from general network data. We demonstrate the effectiveness of RolX on several network mining tasks, from exploratory data analysis to network transfer learning. Moreover, we compare network role discovery with network community discovery. We highlight fundamental differences between the two (e.g., roles generalize across disconnected networks, communities do not).
We focus on large graphs where nodes have attributes, such as a social network where the nodes are labelled with each person's job title. In such a setting, we want to find subgraphs that match a user query pattern. For example, a 'star' query would be, "find a CEO who has strong interactions with a Manager, a Lawyer, and an Accountant, or another structure as close to that as possible". Similarly, a 'loop' query could help spot a money laundering ring.Traditional SQL-based methods, as well as more recent graph indexing methods, will return no answer when an exact match does not exist. Our method can find exact-, as well as near-matches, and it will present them to the user in our proposed 'goodness' order. For example, our method tolerates indirect paths between, say, the 'CEO' and the 'Accountant' of the above sample query, when direct paths do not exist. Its second feature is scalability. In general, if the query has nq nodes and the data graph has n nodes, the problem needs polynomial time complexity O(n nq ), which is prohibitive. Our G-Ray ("Graph X-Ray") method finds high-quality subgraphs in time linear on the size of the data graph.Experimental results on the DLBP author-publication graph (with 356K nodes and 1.9M edges) illustrate both the effectiveness and scalability of our approach. The results agree with our intuition, and the speed is excellent. It takes 4 seconds on average for a 4-node query on the DBLP graph.
Given a large time-evolving graph, how can we model and characterize the temporal behaviors of individual nodes (and network states)? How can we model the behavioral transition patterns of nodes? We propose a temporal behavior model that captures the "roles" of nodes in the graph and how they evolve over time. The proposed dynamic behavioral mixed-membership model (DBMM) is scalable, fully automatic (no user-defined parameters), non-parametric/datadriven (no specific functional form or parameterization), interpretable (identifies explainable patterns), and flexible (applicable to dynamic and streaming networks). Moreover, the interpretable behavioral roles are generalizable and computationally efficient. We applied our model for (a) identifying patterns and trends of nodes and network states based on the temporal behavior, (b) predicting future structural changes, and (c) detecting unusual temporal behavior transitions. The experiments demonstrate the scalability, flexibility, and effectiveness of our model for identifying interesting patterns, detecting unusual structural transitions, and predicting the future structural changes of the network and individual nodes.
This paper evaluates several modifications of the Simple Bayesian Classifier to enable estimation and inference over relational data. The resulting Relational Bayesian Classifiers are evaluated on three real-world datasets and compared to a baseline SBC using no relational information. The approach we call INDEPVAL achieves the best results. We use synthetic data sets to further explore performance as relational data characteristics vary.
We address the problem of classification in partially labeled networks (a.k.a. within-network classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between class labels of neighboring nodes. However, relational classifiers can fail when unlabeled nodes have too few labeled neighbors to support learning (during training phase) and/or inference (during testing phase). This situation arises in realworld problems when observed labels are sparse.In this paper, we propose a novel approach to within-network classification that combines aspects of statistical relational learning and semi-supervised learning to improve classification performance in sparse networks. Our approach works by adding "ghost edges" to a network, which enable the flow of information from labeled to unlabeled nodes. Through experiments on real-world data sets, we demonstrate that our approach performs well across a range of conditions where existing approaches, such as collective classification and semi-supervised learning, fail. On all tasks, our approach improves area under the ROC curve (AUC) by up to 15 points over existing approaches. Furthermore, we demonstrate that our approach runs in time proportional to L · E, where L is the number of labeled nodes and E is the number of edges.
Nutrition screening identifies individuals who are malnourished or at risk of becoming malnourished and who may benefit from nutrition support. The aim of this study was to validate a new malnutrition screening tool (MST) in cancer patients undergoing radiotherapy. The MST was compared with the subjective global assessment (SGA) of nutritional status. One hundred and six patients attending two cancer care centres in Australia were independently rated as well nourished or malnourished using SGA and at risk or not at risk of malnutrition using the MST. Convergent validity of the MST was established by determining the ability of the MST to predict SGA. According to SGA, 89% of the patients were well nourished and 11% were moderately malnourished. According to the MST, 28% of patients were at risk of malnutrition. The MST had a sensitivity of 100% and a specificity of 81%. The positive predictive value was 0.4 and the negative predictive value was 1.0. The MST is easy to use and is a strong predictor of nutritional status. The malnutrition screening tool is a simple, quick, valid tool that can be used to identify radiation oncology outpatients who are at risk of malnutrition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.