Mining frequent patterns with counting inference

Bastide, Yves; Taouil, Rafik; Pasquier, Nicolas; Stumme, Gerd; Lakhal, Lotfi

doi:10.1145/380995.381017

Cited by 222 publications

(139 citation statements)

References 21 publications

Supporting

Mentioning

138

Contrasting

Unclassified

Order By: Relevance

“…Since the number of frequent itemsets can be huge in dense databases, it is now common to use condensed representations (e.g., free itemsets, closed ones, non derivable itemsets [10]) to save space and time during the frequent itemset mining task and to avoid some redundancy. Since [11], it is common to formalize the fact that many itemsets have the same closure by means of closure equivalence relation. Each CEC contains exactly one maximal itemset (w.r.t.…”

Section: Feature Construction Using Closure Equivalence Classesmentioning

confidence: 99%

“…One breakthrough into the computational complexity of such mining tasks has been obtained thanks to condensed representations for frequent itemsets, i.e., rather small collections of patterns from which one can infer the frequency of many sets instead of counting for it (see [10] for a survey). In this paper, we consider closure equivalence classes, i.e., frequent closed sets and their generators [11]. Furthermore, when considering the δ-free itemsets with δ > 0 [12,13], we can consider a "near equivalence" perspective and thus, roughly speaking, the concept of almost-closed itemsets.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Feature Construction Based on Closedness Properties Is Not That Simple

Selmaoui-Folcher

Boulicaut

2008

Advances in Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Abstract. Feature construction has been studied extensively, including for 0/1 data samples. Given the recent breakthrough in closedness-related constraint-based mining, we are considering its impact on feature construction for classification tasks. We investigate the use of condensed representations of frequent itemsets (closure equivalence classes) as new features. These itemset types have been proposed to avoid set counting in difficult association rule mining tasks. However, our guess is that their intrinsic properties (say the maximality for the closed itemsets and the minimality for the δ-free itemsets) might influence feature quality. Understanding this remains fairly open and we discuss these issues thanks to itemset properties on the one hand and an experimental validation on various data sets on the other hand.

show abstract

Section: Feature Construction Using Closure Equivalence Classesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Feature Construction Based on Closedness Properties Is Not That Simple

Selmaoui-Folcher

Boulicaut

2008

Advances in Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

show abstract

“…It computes in a level-wise manner all frequent key sets, and in the same step their closures. Pascal [5] differs from Titanic in that it additionally produces all frequent itemsets. [4] discusses efficient data structures for the algorithms Pascal and Titanic.…”

Section: Algorithms For Computing Frequent Closed / Key Itemsetsmentioning

confidence: 99%

Efficient Mining of Association Rules Based on Formal Concept Analysis

Lakhal

Stumme

2005

Formal Concept Analysis

Self Cite

View full text Add to dashboard Cite

Abstract. Association rules are a popular knowledge discovery technique for warehouse basket analysis. They indicate which items of the warehouse are frequently bought together. The problem of association rule mining has first been stated in 1993. Five years later, several research groups discovered that this problem has a strong connection to Formal Concept Analysis (FCA). In this survey, we will first introduce some basic ideas of this connection along a specific algorithm, Titanic, and show how FCA helps in reducing the number of resulting rules without loss of information, before giving a general overview over the history and state of the art of applying FCA for association rule mining.

show abstract

“…Since our rules are complete, this shows that additional gain is in many cases unlikely. PASCAL [3] In their PASCAL-algorithm, Bastide et al use counting inference to avoid counting the support of all candidates. The rule they are using to avoid counting is based on our rule R I (I − {i}).…”

Section: Proofmentioning

confidence: 99%

Mining All Non-derivable Frequent Itemsets

Calders

Goethals

2002

Principles of Data Mining and Knowledge Discovery

205

194

View full text Add to dashboard Cite

Recent studies on frequent itemset mining algorithms resulted in significant performance improvements. However, if the minimal support threshold is set too low, or the data is highly correlated, the number of frequent itemsets itself can be prohibitively large. To overcome this problem, recently several proposals have been made to construct a concise representation of the frequent itemsets, instead of mining all frequent itemsets. The main goal of this paper is to identify redundancies in the set of all frequent itemsets and to exploit these redundancies in order to reduce the result of a mining operation. We present deduction rules to derive tight bounds on the support of candidate itemsets. We show how the deduction rules allow for constructing a minimal representation for all frequent itemsets. We also present connections between our proposal and recent proposals for concise representations and we give the results of experiments on real-life datasets that show the effectiveness of the deduction rules. In fact, the experiments even show that in many cases, first mining the concise representation, and then creating the frequent itemsets from this representation outperforms existing frequent set mining algorithms.

show abstract

Mining frequent patterns with counting inference

Cited by 222 publications

References 21 publications

Feature Construction Based on Closedness Properties Is Not That Simple

Feature Construction Based on Closedness Properties Is Not That Simple

Efficient Mining of Association Rules Based on Formal Concept Analysis

Mining All Non-derivable Frequent Itemsets

Contact Info

Product

Resources

About