Esther Galbrun scite author profile

Advancements in mobile technology and computing have fostered the collection of a large number of civic datasets that capture the pulse of urban life. Furthermore, the open government and data initiative has led many local authorities to make these datasets publicly available, hoping to drive innovation that will further improve the quality of life for the city-dwellers. In this paper, we develop a novel application that utilizes crime data to provide safe urban navigation. Specifically, using crime data from Chicago and Philadelphia we develop a risk model for their street urban network, which allows us to estimate the relative probability of a crime on any road segment. Given such model we define two variants of the SAFEPATHS problem where the goal is to find a short and low-risk path between a source and a destination location. Since both the length and the risk of the path are equally important but cannot be combined into a single objective, we approach the urban-navigation problem as a biobjective shortest path problem. Our algorithms aim to output a small set of paths that provide tradeoffs between distance and safety. Our experiments demonstrate the efficacy of our algorithms and their practical applicability.

show abstract

From black and white to full color: extending redescription mining outside the Boolean world

Galbrun

Miettinen

2012

Statistical Analysis

View full text Add to dashboard Cite

Redescription mining is a powerful data analysis tool that is used to find multiple descriptions of the same entities. Consider geographical regions as an example. They can be characterized by the fauna that inhabits them on one hand and by their meteorological conditions on the other hand. Finding such redescriptors, a task known as niche‐finding, is of much importance in biology. Current redescription mining methods cannot handle other than Boolean data. This restricts the range of possible applications or makes discretization a pre‐requisite, entailing a possibly harmful loss of information. In niche‐finding, while the fauna can be naturally represented using a Boolean presence/absence data, the weather cannot. In this paper, we extend redescription mining to categorical and real‐valued data with possibly missing values using a surprisingly simple and efficient approach. We provide extensive experimental evaluation to study the behavior of the proposed algorithm. Furthermore, we show the statistical significance of our results using recent innovations on randomization methods. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012

show abstract

Top-k overlapping densest subgraphs

Galbrun

Gionis

Tatti

2016

Data Min Knowl Disc

View full text Add to dashboard Cite

Finding dense subgraphs is an important problem in graph mining and has many practical applications. At the same time, while large real-world networks are known to have many communities that are not well-separated, the majority of the existing work focuses on the problem of finding a single densest subgraph. Hence, it is natural to consider the question of finding the top-k densest subgraphs. One major challenge in addressing this question is how to handle overlaps: eliminating overlaps completely is one option, but this may lead to extracting subgraphs not as dense as it would be possible by allowing a limited amount of overlap. Furthermore, overlaps are desirable as in most realworld graphs there are vertices that belong to more than one community, and thus, to more than one densest subgraph. In this paper we study the problem of finding top-k overlapping densest subgraphs, and we present a new approach that improves over the existing techniques, both in theory and practice. First, we reformulate the problem definition in a way that we are able to obtain an algorithm with constant-factor approximation guarantee. Our approach relies on using techniques for solving the max-sum diversification problem, which however, we need to extend in order to make them applicable to our setting. Second, we evaluate our algorithm on a collection of benchmark datasets and show that it convincingly outperforms the previous methods, both in terms of quality and efficiency.

show abstract

Overlapping community detection in labeled graphs

Galbrun

Gionis

Tatti

2014

Data Min Knowl Disc

View full text Add to dashboard Cite

We present a new approach for the problem of finding overlapping communities in graphs and social networks. Our approach consists of a novel problem definition and three accompanying algorithms. We are particularly interested in graphs that have labels on their vertices, although our methods are also applicable to graphs with no labels. Our goal is to find k communities so that the total edge density over all k communities is maximized. In the case of labeled graphs, we require that each community is succinctly described by a set of labels. This requirement provides a better understanding for the discovered communities. The proposed problem formulation leads to the discovery of vertex-overlapping and dense communities that cover as many graph edges as possible. We capture these properties with a simple objective function, which we solve by adapting efficient approximation algorithms for the generalized maximum-coverage problem and the densest-subgraph problem. Our proposed algorithm is a generic greedy scheme. We experiment with three variants of the scheme, obtained by varying the greedy step of finding a dense subgraph. We validate our algorithms by comparing with other state-of-the-art community-detection methods on a variety of performance measures. Our experiments confirm that our algorithms achieve results of high quality in terms of the reported measures, and are practical in terms of performance.

show abstract

Redescription Mining

Galbrun

Miettinen

2017

View full text Add to dashboard Cite

Maximizing the Diversity of Exposure in a Social Network

Aslay

Matakos

Galbrun

et al. 2018

View full text Add to dashboard Cite

Social-media platforms have created new ways for citizens to stay informed and participate in public debates. However, to enable a healthy environment for information sharing, social deliberation, and opinion formation, citizens need to be exposed to sufficiently diverse viewpoints that challenge their assumptions, instead of being trapped inside filter bubbles. In this paper, we take a step in this direction and propose a novel approach to maximize the diversity of exposure in a social network. We formulate the problem in the context of information propagation, as a task of recommending a small number of news articles to selected users. We propose a realistic setting where we take into account content and user leanings, and the probability of further sharing an article. This setting allows us to capture the balance between maximizing the spread of information and ensuring the exposure of users to diverse viewpoints.The resulting problem can be cast as maximizing a monotone and submodular function subject to a matroid constraint on the allocation of articles to users. It is a challenging generalization of the influence maximization problem. Yet, we are able to devise scalable approximation algorithms by introducing a novel extension to the notion of random reverse-reachable sets. We experimentally demonstrate the efficiency and scalability of our algorithm on several real-world datasets.

show abstract

Association Discovery in Two-View Data

Leeuwen

Galbrun

2015

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

International audienceTwo-view datasets are datasets whose attributes are naturally split into two sets, each providing a different view on the same set of objects. We introduce the task of finding small and non-redundant sets of associations that describe how the two views are related. To achieve this, we propose a novel approach in which sets of rules are used to translate one view to the other and vice versa. Our models, dubbed translation tables, contain both unidirectional and bidirectional rules that span both views and provide lossless translation from either of the views to the opposite view. To be able to evaluate different translation tables and perform model selection, we present a score based on the Minimum Description Length (MDL) principle. Next, we introduce three TRANSLATOR algorithms to find good models according to this score. The first algorithm is parameter-free and iteratively adds the rule that improves compression most. The other two algorithms use heuristics to achieve better trade-offs between runtime and compression. The empirical evaluation on real-world data demonstrates that only modest numbers of associations are needed to characterize the two-view structure present in the data, while the obtained translation rules are easily interpretable and provide insight into the data

show abstract

Mining Periodic Patterns with a MDL Criterion

Galbrun

Cellier

Tatti

et al. 2019

View full text Add to dashboard Cite

The quantity of event logs available is increasing rapidly, be they produced by industrial processes, computing systems, or life tracking, for instance. It is thus important to design effective ways to uncover the information they contain. Because event logs often record repetitive phenomena, mining periodic patterns is especially relevant when considering such data. Indeed, capturing such regularities is instrumental in providing condensed representations of the event sequences. We present an approach for mining periodic patterns from event logs while relying on a Minimum Description Length (MDL) criterion to evaluate candidate patterns. Our goal is to extract a set of patterns that suitably characterises the periodic structure present in the data. We evaluate the interest of our approach on several real-world event log datasets. ]

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Esther Galbrun

Urban navigation beyond shortest route: The case of safe paths

From black and white to full color: extending redescription mining outside the Boolean world

Top-k overlapping densest subgraphs

Overlapping community detection in labeled graphs

Redescription Mining

Maximizing the Diversity of Exposure in a Social Network

Association Discovery in Two-View Data

Mining Periodic Patterns with a MDL Criterion

Contact Info

Product

Resources

About