No abstract
Abstract. We focus on the discovery of interesting patterns in dynamic attributed graphs. To this end, we define the novel problem of mining cohesive co-evolution patterns. Briefly speaking, cohesive co-evolution patterns are tri-sets of vertices, timestamps, and signed attributes that describe the local co-evolutions of similar vertices at several timestamps according to set of signed attributes that express attributes trends. We design the first algorithm to mine the complete set of cohesive co-evolution patterns in a dynamic graph. Some experiments performed on both synthetic and real-world datasets demonstrate that our algorithm enables to discover relevant patterns in a feasible time.
International audienceMultidimensional databases have been designed to provide decision makers with the necessary tools to help them understand their data. This framework is different from transactional data as the datasets contain huge volumes of historicized and aggregated data defined over a set of dimensions that can be arranged through multiple levels of granularities. Many tools have been proposed to query the data and navigate through the levels of granularity. However, automatic tools are still missing to mine this type of data in order to discover regular specific patterns. In this article, we present a method for mining sequential patterns from multidimensional databases, at the same time taking advantage of the different dimensions and levels of granularity, which is original compared to existing work. The necessary definitions and algorithms are extended from regular sequential patterns to this particular case. Experiments are reported, showing the significance of this approach
Mining sequential patterns aims at discovering correlations between events through time, like A customer who bought a TV and a DVD player at the same time later bought a recorder. Even if many works have dealt with sequential pattern mining, there is no work leading to rules like A customer who bought a surfboard together with a bag in NY later bought a wetsuit in SF. In this paper, we propose M 2 SP, a complete approach to mine such multidimensional sequential patterns. The main originality of our proposition is that we obtain not only intra-pattern sequences but also inter-pattern sequences. Moreover, we consider generalized multidimensional sequential patterns, called jokerized patterns, in which some of the dimension values may not be instanciated. Experiments on synthetic data are reported and show the scalability of our approach.
An important goal in researching the biology of olfaction is to link the perception of smells to the chemistry of odorants. In other words, why do some odorants smell like fruits and others like flowers? While the so-called stimulus-percept issue was resolved in the field of color vision some time ago, the relationship between the chemistry and psycho-biology of odors remains unclear up to the present day. Although a series of investigations have demonstrated that this relationship exists, the descriptive and explicative aspects of the proposed models that are currently in use require greater sophistication. One reason for this is that the algorithms of current models do not consistently consider the possibility that multiple chemical rules can describe a single quality despite the fact that this is the case in reality, whereby two very different molecules can evoke a similar odor. Moreover, the available datasets are often large and heterogeneous, thus rendering the generation of multiple rules without any use of a computational approach overly complex. We considered these two issues in the present paper. First, we built a new database containing 1689 odorants characterized by physicochemical properties and olfactory qualities. Second, we developed a computational method based on a subgroup discovery algorithm that discriminated perceptual qualities of smells on the basis of physicochemical properties. Third, we ran a series of experiments on 74 distinct olfactory qualities and showed that the generation and validation of rules linking chemistry to odor perception was possible. Taken together, our findings provide significant new insights into the relationship between stimulus and percept in olfaction. In addition, by automatically extracting new knowledge linking chemistry of odorants and psychology of smells, our results provide a new computational framework of analysis enabling scientists in the field to test original hypotheses using descriptive or predictive modeling.
Geo-located social media provide a large amount of information describing urban areas based on user descriptions and comments. Such data makes possible to identify meaningful city neighborhoods on the basis of the footprints left by a large and diverse population that uses this type of media. In this paper, we present some methods to exhibit the predominant activities and their associated urban areas to automatically describe a whole city. Based on a suitably attributed graph model, our approach identifies neighborhoods with homogeneous and exceptional characteristics. We introduce the novel problem of exceptional subgraph mining in attributed graphs and propose a complete algorithm that takes benefits from closure operators, new upper bounds and pruning properties. We also define an approach to sample the space of closed exceptional subgraphs within a given time-budget. Experiments performed on 10 real datasets are reported and demonstrate the relevancy of both approaches, and also show their limits.
International audienceData mining is the study of how to extract information from data and express it as useful knowledge. One of its most important subfields, pattern mining, involves searching and enumerating interesting patterns in data. Various aspects of pattern mining are studied in the theory of computation and statistics. In the last decade, the pattern mining community has witnessed a sharp shift from efficiency-based approaches to methods which can extract more meaningful patterns. Recently, new methods adapting results from studies of economic efficiency and multi-criteria decision analyses such as Pareto efficiency, or skylines, have been studied. Within pattern mining, this novel line of research allows the easy expression of preferences according to a dominance relation. This approach is useful from a user-preference point of view and tends to promote the use of pattern mining algorithms for non-experts. We present a significant extension of our previous work [1] and [2] on the discovery of skyline patterns (or “skypatterns”) based on the theoretical relationships with condensed representations of patterns. We show how these relationships facilitate the computation of skypatterns and we exploit them to propose a flexible and efficient approach to mine skypatterns using a dynamic constraint satisfaction problems (CSP) framework.We present a unified methodology of our different approaches towards the same goal. This work is supported by an extensive experimental study allowing us to illustrate the strengths and weaknesses of each approach
International audiencePopular data mining methods support knowledge discovery from patterns that hold in binary relations. We study the generalization of association rule mining within arbitrary n-ary relation and thus Boolean tensors instead of Boolean matrices. Indeed, many datasets of interest correspond to relations whose number of dimensions is greater or equal to3. However, just a few proposals deal with rule discovery when both the head and the body can involve subsets of any dimensions. A challenging problem is to provide a semantics to such generalized rules by means of objective interestingness measures that have to be carefully designed. Therefore, we discuss the need for different generalizations of the classical confidence measure. We also present the first algorithm that computes, in such a general framework, every rule that satisfies both a minimal frequency constraint and minimal confidence constraints. The approach is tested on real datasets (ternary and 4-ary relations). We report on a case study that concerns dynamic graph analysis thanks to rules
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.