Artur Bykowski scite author profile

Abstract.Given a large collection of transactions containing items, a basic common data mining problem is to extract the so-called frequent itemsets (i.e., set of items appearing in at least a given number of transactions). In this paper, we propose a structure called free-sets, from which we can approximate any itemset support (i.e., the number of transactions containing the itemset) and we formalize this notion in the framework of -adequate representation [10].We show that frequent free-sets can be efficiently extracted using pruning strategies developed for frequent itemset discovery, and that they can be used to approximate the support of any frequent itemset. Experiments run on real dense data sets show a significant reduction of the size of the output when compared with standard frequent itemsets extraction. Furthermore, the experiments show that the extraction of frequent free-sets is still possible when the extraction of frequent itemsets becomes intractable. Finally, we show that the error made when approximating frequent itemset support remains very low in practice.

show abstract

A condensed representation to find frequent patterns

Bykowski

Rigotti

2001

View full text Add to dashboard Cite

Given a large set of data, a common data mining problem is to extract the frequent patterns occurring in this set. The idea presented in this paper is to extract a condensed representation of the frequent patterns called disjunction-free sets, instead of extracting the whole frequent pattern collection. We show that this condensed representation can be used to regenerate all frequent patterns and their exact frequencies. Moreover, this regeneration can be performed without any access to the original data. Practical experiments show that this representation can be extracted very efficiently even in difficult cases. We compared it with another representation of frequent patterns previously investigated in the literature called frequent closed sets. In nearly all experiments we have run, the disjunction-free sets have been extracted much more efficiently than frequent closed sets.

show abstract

Frequent Closures as a Concise Representation for Binary Data Mining

Boulicaut

Bykowski

2000

View full text Add to dashboard Cite

DBC: a condensed representation of frequent patterns for efficient mining

Bykowski¹,

Rigotti²

2003

Information Systems

View full text Add to dashboard Cite

Given a large set of data, a common data mining problem is to extract the frequent patterns occurring in this set. The idea presented in this paper is to extract a condensed representation of the frequent patterns called disjunction-bordered condensation (DBC), instead of extracting the whole frequent pattern collection. We show that this condensed representation can be used to regenerate all frequent patterns and their exact frequencies. Moreover, this regeneration can be performed without any access to the original data. Practical experiments show that the DBC can be extracted very efficiently even in difficult cases and that this extraction and the regeneration of the frequent patterns is much more efficient than the direct extraction of the frequent patterns themselves. We compared the DBC with another representation of frequent patterns previously investigated in the literature called frequent closed sets. In nearly all experiments we have run, the DBC have been extracted much more efficiently than frequent closed sets. In the other cases, the extraction times are very close.

show abstract

Model-Independent Bounding of the Supports of Boolean Formulae in Binary Data

Bykowski¹,

Seppänen²,

Hollmén³

2004

View full text Add to dashboard Cite

Abstract. Data mining algorithms such as the Apriori method for finding frequent sets in sparse binary data can be used for efficient computation of a large number of summaries from huge data sets. The collection of frequent sets gives a collection of marginal frequencies about the underlying data set. Sometimes, we would like to use a collection of such marginal frequencies instead of the entire data set (e.g. when the original data is inaccessible for confidentiality reasons) to compute other interesting summaries. Using combinatorial arguments, we may obtain tight upper and lower bounds on the values of inferred summaries. In this paper, we consider a class of summaries wider than frequent sets, namely that of frequencies of arbitrary Boolean formulae. Given frequencies of a number of any different Boolean formulae, we consider the problem of finding tight bounds on the frequency of another arbitrary formula. We give a general formulation of the problem of bounding formula frequencies given some background information, and show how the bounds can be obtained by solving a linear programming problem. We illustrate the accuracy of the bounds by giving empirical results on real data sets.

show abstract

Towards the Tractable Discovery of Association Rules with Negations

Boulicaut

Bykowski

Jeudy

2001

View full text Add to dashboard Cite

Abstract. Frequent association rules (e.g., A∧B ⇒ C to say that when properties A and B are true in a record then, C tends to be also true) have become a popular way to summarize huge datasets. The last 5 years, there has been a lot of research on association rule mining and more precisely, the tractable discovery of interesting rules among the frequent ones. We consider now the problem of mining association rules that may involve negations e.g., A ∧ B ⇒ ¬C or ¬A ∧ B ⇒ C. Mining such rules is difficult and remains an open problem. We identify several possibilities for a tractable approach in practical cases. Among others, we discuss the active use of constraints. We propose a generic algorithm and discuss the use of constraints to mine the generalized sets from which rules with negations can be derived.

show abstract

Untitled

2003

View full text Add to dashboard Cite

Integrity Constraints over Association Rules

Bykowski

Daurel

Méger

et al. 2004

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Artur Bykowski

Approximation of Frequency Queries by Means of Free-Sets

A condensed representation to find frequent patterns

Frequent Closures as a Concise Representation for Binary Data Mining

DBC: a condensed representation of frequent patterns for efficient mining

Model-Independent Bounding of the Supports of Boolean Formulae in Binary Data

Towards the Tractable Discovery of Association Rules with Negations

Untitled

Integrity Constraints over Association Rules

Contact Info

Product

Resources

About