Mining Frequent Itemsets in a Stream

Calders, Toon; Dexters, Nele; Goethals, Bart

doi:10.1109/icdm.2007.66

Cited by 73 publications

(80 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Further interestingness measures for episodes, either statistically motivated or aimed at removing bias towards smaller episodes, were made by Garriga [5], Gwadera et al [10,11], Calders et al [4], and Tatti [17]. All these methods, however, were limited to finding interesting episodes, and stopped short of discovering association rules between them.…”

Section: Related Workmentioning

confidence: 99%

“…First of all, the confidence of rule G ⇒ H would depend on our choice of disjoint minimal windows -if we chose the first minimal window of H, s [1,5], we would find two occurrences of G outside it and the confidence of the rule would be 1 3 , whereas if we chose the second minimal window of H, s [4,10], we would find just one occurrence of G outside it, and the confidence would be 2 3 . More importantly, whichever choice we made, we would not be able to get the correct result, showing that every occurrence of G is contained within an occurrence of H. Now that we have seen that we cannot define the confidence of an association rule using either the disjoint-window frequencies, or the containment of the disjoint occurrences, of the two episodes, we are ready to present a definition that corresponds exactly to our intuition.…”

Section: Using Minimal Windowsmentioning

confidence: 99%

“…Consider sequence s = axybca and episodes X = b → c and Y = {a, b → c}. Sequence s contains one minimal window of X, namely s [4,5]. This occurrence of X can be extended in search of an occurrence of Y .…”

Section: Definition 14mentioning

confidence: 99%

“…This occurrence of X can be extended in search of an occurrence of Y . There are two candidate minimal windows of Y that can be considered, s [1,5] and s [4,6]. If we choose the former, the extensibility of s [4,5] into Y would equal 2 5 , and if we use the latter, this value would rise up to 2 3 .…”

Section: Definition 14mentioning

confidence: 99%

“…There are two candidate minimal windows of Y that can be considered, s [1,5] and s [4,6]. If we choose the former, the extensibility of s [4,5] into Y would equal 2 5 , and if we use the latter, this value would rise up to 2 3 . Since we are interested in how far we need to look in order to extend an occurrence of X into an occurrence of Y , we clearly need to look for the smallest minimal window of Y that satisfies the conditions.…”

Section: Definition 14mentioning

confidence: 99%

See 4 more Smart Citations

MARBLES: Mining Association Rules Buried in Long Event Sequences

Čule

Tatti

Goethals

2012

Proceedings of the 2012 SIAM International Conference on Data Mining

Self Cite

View full text Add to dashboard Cite

Sequential pattern discovery is a well-studied field in data mining. Episodes are sequential patterns that describe events that often occur in the vicinity of each other. Episodes can impose restrictions on the order of the events, which makes them a versatile technique for describing complex patterns in the sequence. Most of the research on episodes deals with special cases such as serial and parallel episodes, while discovering general episodes is surprisingly understudied. This is particularly true when it comes to discovering association rules between them.In this paper we propose an algorithm that mines association rules between two general episodes. On top of the traditional definitions of frequency and confidence, we introduce two novel confidence measures for the rules. The major challenge in mining these association rules is pattern explosion. To limit the output, we aim to eliminate all redundant rules. We define the class of closed association rules, and show that this class contains all non-redundant output. To make the algorithm efficient, we use further pruning steps along the way. First of all, we generate only free and closed frequent episodes from which we create candidate rules, we speed up the evaluation of the rules, and finally prune the remaining non-closed rules from the output.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Using Minimal Windowsmentioning

confidence: 99%

Section: Definition 14mentioning

confidence: 99%

Section: Definition 14mentioning

confidence: 99%

Section: Definition 14mentioning

confidence: 99%

See 3 more Smart Citations

MARBLES: Mining Association Rules Buried in Long Event Sequences

Čule

Tatti

Goethals

2012

Proceedings of the 2012 SIAM International Conference on Data Mining

Self Cite

View full text Add to dashboard Cite

show abstract

MARBLES: Mining association rules buried in long event sequences

Čule

Tatti

Goethals

2013

Statistical Analysis

Self Cite

View full text Add to dashboard Cite

show abstract

A survey of itemset mining

Fournier‐Viger

Lin

et al. 2017

WIREs Data Min & Knowl

193

137

View full text Add to dashboard Cite

Itemset mining is an important subfield of data mining, which consists of discovering interesting and useful patterns in transaction databases. The traditional task of frequent itemset mining is to discover groups of items (itemsets) that appear frequently together in transactions made by customers. Although itemset mining was designed for market basket analysis, it can be viewed more generally as the task of discovering groups of attribute values frequently cooccurring in databases. Because of its numerous applications in domains such as bioinformatics, text mining, product recommendation, e‐learning, and web click stream analysis, itemset mining has become a popular research area. This study provides an up‐to‐date survey that can serve both as an introduction and as a guide to recent advances and opportunities in the field. The problem of frequent itemset mining and its applications are described. Moreover, main approaches and strategies to solve itemset mining problems are presented, as well as their characteristics are provided. Limitations of traditional frequent itemset mining approaches are also highlighted, and extensions of the task of itemset mining are presented such as high‐utility itemset mining, rare itemset mining, fuzzy itemset mining, and uncertain itemset mining. This study also discusses research opportunities and the relationship to other popular pattern mining problems, such as sequential pattern mining, episode mining, subgraph mining, and association rule mining. Main open‐source libraries of itemset mining implementations are also briefly presented. WIREs Data Mining Knowl Discov 2017, 7:e1207. doi: 10.1002/widm.1207 This article is categorized under: Algorithmic Development > Association Rules Technologies > Association Rules

show abstract

Mining Frequent Itemsets in a Stream

Cited by 73 publications

References 8 publications

MARBLES: Mining Association Rules Buried in Long Event Sequences

MARBLES: Mining Association Rules Buried in Long Event Sequences

MARBLES: Mining association rules buried in long event sequences

A survey of itemset mining

Contact Info

Product

Resources

About