Dóra Erdős scite author profile

Miettinen

2013

Abstract-Tensors are becoming increasingly common in data mining, and consequently, tensor factorizations are becoming more important tools for data miners. When the data is binary, it is natural to ask if we can factorize it into binary factors while simultaneously making sure that the reconstructed tensor is still binary. Such factorizations, called Boolean tensor factorizations, can provide improved interpretability and find Boolean structure that is hard to express using normal factorizations. Unfortunately the algorithms for computing Boolean tensor factorizations do not usually scale well. In this paper we present a novel algorithm for finding Boolean CP and Tucker decompositions of large and sparse binary tensors. In our experimental evaluation we show that our algorithm can handle large tensors and accurately reconstructs the latent Boolean structure.

A Framework for the Evaluation and Management of Network Centrality

Ishakian

Terzi

et al. 2012

Network-analysis literature is rich in node-centrality measures that quantify the centrality of a node as a function of the (shortest) paths of the network that go through it. Existing work focuses on defining instances of such measures and designing algorithms for the specific combinatorial problems that arise for each instance. In this work, we propose a unifying definition of centrality that subsumes all path-counting based centrality definitions: e.g., stress, betweenness or paths centrality. We also define a generic algorithm for computing this generalized centrality measure for every node and every group of nodes in the network. Next, we define two optimization problems: k-Group Centrality Maximization and k-Edge Centrality Boosting. In the former, the task is to identify the subset of k nodes that have the largest group centrality. In the latter, the goal is to identify up to k edges to add to the network so that the centrality of a node is maximized. We show that both of these problems can be solved efficiently for arbitrary centrality definitions using our general framework. In a thorough experimental evaluation we show the practical utility of our framework and the efficacy of our algorithms.

A Divide-and-Conquer Algorithm for Betweenness Centrality

Ishakian

Bestavros

et al. 2015

Given a graph G we define the betweenness centrality of a node v in V as the fraction of shortest paths between all node pairs in V that contain v. For this setting we describe Brandes++, a divide-and-conquer algorithm that can efficiently compute the exact values of betweenness scores. Brandes++ uses Brandes-the most widelyused algorithm for betweenness computation -as its subroutine. It achieves the notable faster running times by applying Brandes on significantly smaller networks than the input graph, and many of its computations can be done in parallel. The degree of speedup achieved by Brandes++ depends on the community structure of the input network. Our experiments with real-life networks reveal Brandes++ achieves an average of 10-fold speedup over Brandes, while there are networks where this speedup is 75-fold. We have made our code public to benefit the research community.

Discovering facts with boolean tensor tucker decomposition

Miettinen

2013

Open Information Extraction (Open IE) has gained increasing research interest in recent years. The first step in Open IE is to extract raw subject-predicate-object triples from the data. These raw triples are rarely usable per se, and need additional post-processing. To that end, we proposed the use of Boolean Tucker tensor decomposition to simultaneously find the entity and relation synonyms and the facts connecting them from the raw triples. Our method represents the synonym sets and facts using (sparse) binary matrices and tensor that can be efficiently stored and manipulated.We consider the presentation of the problem as a Boolean tensor decomposition as one of this paper's main contributions. To study the validity of this approach, we use a recent algorithm for scalable Boolean Tucker decomposition. We validate the results with empirical evaluation on a new semi-synthetic data set, generated to faithfully reproduce real-world data features, as well as with real-world data from existing Open IE extractor. We show that our method obtains high precision while the low recall can easily be remedied by considering the original data together with the decomposition.

Reconstructing Graphs from Neighborhood Data

ACM Trans. Knowl. Discov. Data

Gemulla

Terzi

2014

Consider a social network and suppose that we are only given the number of common friends between each pair of users. Can we reconstruct the underlying network? Similarly, consider a set of documents and the words that appear in them. If we only know the number of common words for every pair of documents, as well as the number of common documents for every pair of words, can we infer which words appear in which documents? In this article, we develop a general methodology for answering questions like these. We formalize these questions in what we call the RECONSTRUCT problem: given information about the common neighbors of nodes in a network, our goal is to reconstruct the hidden binary matrix that indicates the presence or absence of relationships between individual nodes. In fact, we propose two different variants of this problem: one where the number of connections of every node (i.e., the degree of every node) is known and a second one where it is unknown. We call these variants the degree-aware and the degree-oblivious versions of the RECONSTRUCT problem, respectively. Our algorithms for both variants exploit the properties of the singular value decomposition of the hidden binary matrix. More specifically, we show that using the available neighborhood information, we can reconstruct the hidden matrix by finding the components of its singular value decomposition and then combining them appropriately. Our extensive experimental study suggests that our methods are able to reconstruct binary matrices of different characteristics with up to 100% accuracy.