DESQ: Frequent Sequence Mining with Subsequence Constraints

ACM Trans. Database Syst.

Martens

2019

Self Cite

Frequent sequence mining methods often make use of constraints to control which subsequences should be mined. A variety of such subsequence constraints has been studied in the literature, including length, gap, span, regular-expression, and hierarchy constraints. In this article, we show that many subsequence constraints-including and beyond those considered in the literature-can be unified in a single framework. A unified treatment allows researchers to study jointly many types of subsequence constraints (instead of each one individually) and helps to improve usability of pattern mining systems for practitioners. In more detail, we propose a set of simple and intuitive "pattern expressions" to describe subsequence constraints and explore algorithms for efficiently mining frequent subsequences under such general constraints. Our algorithms translate pattern expressions to succinct finite-state transducers, which we use as computational model, and simulate these transducers in a way suitable for frequent sequence mining. Our experimental study on real-world datasets indicates that our algorithms-although more general-are efficient and, when used for sequence mining with prior constraints studied in literature, competitive to (and in some cases superior to) state-of-the-art specialized methods.

Section: Worst-case Runtime Of Two-pass Approachmentioning

confidence: 95%

Section: Computational Modelmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

“…A preliminary version of this article appeared in Reference[11] 2. https://www.uni-mannheim.de/dws/research/resources/desq/.…”

mentioning

confidence: 99%

See 2 more Smart Citations

A Unified Framework for Frequent Sequence Mining with Subsequence Constraints

Beedkar

ACM Trans. Database Syst.

Martens

2019

Self Cite

“…One approach to improve flexibility is the use of subsequence constraints, which specify conditions under which a subsequence is potentially interesting to the particular application. Ordered by increasing flexibility, common types of subsequence constraints include length constraints [28], [34], gap and duration constraints [14], [28], [34], hierarchy constraints [28], "output filter" regular expression constraints [2], [3], [13], [31], and regular expression constraints with capture groups and hierarchies [5], [7]. The latter type subsumes the remaining ones, and we subsequently refer to it as flexible constraints.…”

Section: Introductionmentioning

confidence: 99%

Scalable Frequent Sequence Mining with Flexible Subsequence Constraints

Renz-Wieland

Bertsch

2019 IEEE 35th International Conference on Data Engineering (ICDE)

2019

Self Cite

We study scalable algorithms for frequent sequence mining under flexible subsequence constraints. Such constraints enable applications to specify concisely which patterns are of interest and which are not. We focus on the bulk synchronous parallel model with one round of communication; this model is suitable for platforms such as MapReduce or Spark. We derive a general framework for frequent sequence mining under this model and propose the D-SEQ and D-CAND algorithms within this framework. The algorithms differ in what data are communicated and how computation is split up among workers. To the best of our knowledge, D-SEQ and D-CAND are the first scalable algorithms for frequent sequence mining with flexible constraints. We conducted an experimental study on multiple real-world datasets that suggests that our algorithms scale nearly linearly, outperform common baselines, and offer acceptable generalization overhead over existing, less general mining algorithms.

DESQ: Frequent Sequence Mining with Subsequence Constraints

Beedkar

2016 IEEE 16th International Conference on Data Mining (ICDM)

2016

Abstract-Frequent sequence mining methods often make use of constraints to control which subsequences should be mined; e.g., length, gap, span, regular-expression, and hierarchy constraints. We show that many subsequence constraints-including and beyond those considered in the literature-can be unified in a single framework. In more detail, we propose a set of simple and intuitive "pattern expressions" to describe subsequence constraints and explore algorithms for efficiently mining frequent subsequences under such general constraints. A unified treatment allows researchers to study jointly many types of subsequence constraints (instead of each one individually) and helps to improve usability of pattern mining systems for practitioners.