Discovering Unbounded Episodes in Sequential Data

Casas-Garriga, Gemma

doi:10.1007/978-3-540-39804-2_10

Cited by 76 publications

(100 citation statements)

References 5 publications

(12 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Further interestingness measures for episodes, either statistically motivated or aimed at removing bias towards smaller episodes, were made by Garriga [5], Gwadera et al [10,11], Calders et al [4], and Tatti [17]. All these methods, however, were limited to finding interesting episodes, and stopped short of discovering association rules between them.…”

Section: Related Workmentioning

confidence: 99%

“…First of all, the confidence of rule G ⇒ H would depend on our choice of disjoint minimal windows -if we chose the first minimal window of H, s [1,5], we would find two occurrences of G outside it and the confidence of the rule would be 1 3 , whereas if we chose the second minimal window of H, s [4,10], we would find just one occurrence of G outside it, and the confidence would be 2 3 . More importantly, whichever choice we made, we would not be able to get the correct result, showing that every occurrence of G is contained within an occurrence of H. Now that we have seen that we cannot define the confidence of an association rule using either the disjoint-window frequencies, or the containment of the disjoint occurrences, of the two episodes, we are ready to present a definition that corresponds exactly to our intuition.…”

Section: Using Minimal Windowsmentioning

confidence: 99%

“…Consider sequence s = axybca and episodes X = b → c and Y = {a, b → c}. Sequence s contains one minimal window of X, namely s [4,5]. This occurrence of X can be extended in search of an occurrence of Y .…”

Section: Definition 14mentioning

confidence: 99%

“…This occurrence of X can be extended in search of an occurrence of Y . There are two candidate minimal windows of Y that can be considered, s [1,5] and s [4,6]. If we choose the former, the extensibility of s [4,5] into Y would equal 2 5 , and if we use the latter, this value would rise up to 2 3 .…”

Section: Definition 14mentioning

confidence: 99%

“…There are two candidate minimal windows of Y that can be considered, s [1,5] and s [4,6]. If we choose the former, the extensibility of s [4,5] into Y would equal 2 5 , and if we use the latter, this value would rise up to 2 3 . Since we are interested in how far we need to look in order to extend an occurrence of X into an occurrence of Y , we clearly need to look for the smallest minimal window of Y that satisfies the conditions.…”

Section: Definition 14mentioning

confidence: 99%

See 4 more Smart Citations

MARBLES: Mining Association Rules Buried in Long Event Sequences

Čule

Tatti

Goethals

2012

Proceedings of the 2012 SIAM International Conference on Data Mining

View full text Add to dashboard Cite

Sequential pattern discovery is a well-studied field in data mining. Episodes are sequential patterns that describe events that often occur in the vicinity of each other. Episodes can impose restrictions on the order of the events, which makes them a versatile technique for describing complex patterns in the sequence. Most of the research on episodes deals with special cases such as serial and parallel episodes, while discovering general episodes is surprisingly understudied. This is particularly true when it comes to discovering association rules between them.In this paper we propose an algorithm that mines association rules between two general episodes. On top of the traditional definitions of frequency and confidence, we introduce two novel confidence measures for the rules. The major challenge in mining these association rules is pattern explosion. To limit the output, we aim to eliminate all redundant rules. We define the class of closed association rules, and show that this class contains all non-redundant output. To make the algorithm efficient, we use further pruning steps along the way. First of all, we generate only free and closed frequent episodes from which we create candidate rules, we speed up the evaluation of the rules, and finally prune the remaining non-closed rules from the output.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Using Minimal Windowsmentioning

confidence: 99%

Section: Definition 14mentioning

confidence: 99%

Section: Definition 14mentioning

confidence: 99%

Section: Definition 14mentioning

confidence: 99%

See 3 more Smart Citations

MARBLES: Mining Association Rules Buried in Long Event Sequences

Čule

Tatti

Goethals

2012

Proceedings of the 2012 SIAM International Conference on Data Mining

View full text Add to dashboard Cite

show abstract

MARBLES: Mining association rules buried in long event sequences

Čule

Tatti

Goethals

2013

Statistical Analysis

View full text Add to dashboard Cite

show abstract

Method evaluation, parameterization, and result validation in unsupervised data mining: A critical survey

Zimmermann

2019

WIREs Data Min & Knowl

View full text Add to dashboard Cite

Machine Learning (ML) and Data Mining (DM) build tools intended to help users solve data‐related problems that are infeasible for “unaugmented” humans. Tools need manuals, however, and in the case of ML/DM methods, this means guidance with respect to which technique to choose, how to parameterize it, and how to interpret derived results to arrive at knowledge about the phenomena underlying the data. While such information is available in the literature, it has not yet been collected in one place. We survey three types of work for clustering and pattern mining: (1) comparisons of existing techniques, (2) evaluations of different parameterization options and studies providing guidance for setting parameter values, and (3) work comparing mining results with the ground truth. We find that although interesting results exist, as a whole the body of work on these questions is too limited. In addition, we survey recent studies in the field of community detection, as a contrasting example. We argue that an objective obstacle for performing needed studies is a lack of data and survey the state of available data, pointing out certain limitations. As a solution, we propose to augment existing data by artificially generated data, review the state‐of‐the‐art in data generation in unsupervised mining, and identify shortcomings. In more general terms, we call for the development of a true “Data Science” that—based on work in other domains, results in ML, and existing tools—develops needed data generators and builds up the knowledge needed to effectively employ unsupervised mining techniques. This article is categorized under: Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining Ensemble Methods > Structure Discovery Internet > Society and Culture Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining

show abstract

Discovering Unbounded Episodes in Sequential Data

Cited by 76 publications

References 5 publications

MARBLES: Mining Association Rules Buried in Long Event Sequences

MARBLES: Mining Association Rules Buried in Long Event Sequences

MARBLES: Mining association rules buried in long event sequences

Method evaluation, parameterization, and result validation in unsupervised data mining: A critical survey

Contact Info

Product

Resources

About