Approximating the number of frequent sets in dense data

Boley, Mario; Großkreutz, Henrik

doi:10.1007/s10115-009-0212-4

Cited by 20 publications

(17 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The worst-case convergence can, however, be exponentially slow in the size of the input database. For sampling from the family of frequent patterns, this problem appears to be inherent: almost uniform frequent pattern sampling can be used for approximate frequent pattern counting, which one can show to be intractable under reasonable complexity assumptions (see [7]). Similar conclusions can be drawn for enumeration spaces defined by linearly scaled versions of the frequency measure such as the standard optimistic estimator for the binomial test quality function in subgroup discovery [27].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Direct local pattern sampling by efficient two-step random procedures

Boley

Lucchese²,

Paurat

et al. 2011

Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Self Cite

116

View full text Add to dashboard Cite

We present several exact and highly scalable local pattern sampling algorithms. They can be used as an alternative to exhaustive local pattern discovery methods (e.g, frequent set mining or optimistic-estimator-based subgroup discovery) and can substantially improve efficiency as well as controllability of pattern discovery processes. While previous sampling approaches mainly rely on theMarkov chainMonte Carlo method, our procedures are direct, i.e., non processsimulating, sampling algorithms. The advantages of these direct methods are an almost optimal time complexity per pattern as well as an exactly controlled distribution of the produced patterns. Namely, the proposed algorithms can sample (item-)sets according to frequency, area, squared frequency, and a class discriminativity measure. Experiments demonstrate that these procedures can improve the accuracy of pattern-based models similar to frequent sets and often also lead to substantial gains in terms of scalability. Copyright 2011 ACM

show abstract

Section: Related Workmentioning

confidence: 99%

“…Boley and Grosskreutz [7] proposes frequent set sampling to approximate the effect of specific minimum frequency thresholds. The proposed algorithm simulates a simple Glauber dynamic on the frequent set lattice: starting with the empty set, in each subsequent time step a single item is either removed or added to the current set.…”

Section: Related Workmentioning

confidence: 99%

Direct local pattern sampling by efficient two-step random procedures

Boley

Lucchese²,

Paurat

et al. 2011

Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Self Cite

116

View full text Add to dashboard Cite

show abstract

“…In [2], the authors introduced a generic sampling framework to sample the output space of frequent subgraphs, which is based on MCMC algorithm as well. In the context of itemset mining, [5] proposed a randomized approximation method for counting the number of frequent itemsets. In [4] a MetropolisHastings algorithm for sampling closed itemsets is given.…”

Section: Related Workmentioning

confidence: 99%

Sampling minimal frequent boolean (DNF) patterns

Zaki

2012

Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

We tackle the challenging problem of mining the simplest Boolean patterns from categorical datasets. Instead of complete enumeration, which is typically infeasible for this class of patterns, we develop effective sampling methods to extract a representative subset of the minimal Boolean patterns (in disjunctive normal form -DNF). We make both theoretical and practical contributions, which allow us to prune the search space based on provable properties. Our approach can provide a near-uniform sample of the minimal DNF patterns. We also show that the mined minimal DNF patterns are very effective when used as features for classification.

show abstract

“…Our solution for frequent pattern mining from hidden data utilizes sampling of patterns from the frequent pattern space. In the literature, there exist several pattern sampling algorithms , but they do not fulfill the purpose of our task. The major difference of these algorithms with ours is that the sampling distribution of our method changes through user interaction, whereas in the above‐cited works the sampling distribution remains fixed throughout the sampling session.…”

Section: Introductionmentioning

confidence: 99%

Interactive knowledge discovery from hidden data through sampling of frequent patterns

Bhuiyan

Hasan

2016

Statistical Analysis

View full text Add to dashboard Cite

In real life, many important datasets are not publicly accessible due to various reasons, including privacy protection and maintenance of business competitiveness. However, Knowledge discovery and pattern mining from these datasets can bring enormous benefit both to the data owner and the external entities. In this paper, we propose a novel solution for this task, which is based on Markov chain Monte Carlo (MCMC) sampling of frequent patterns. Instead of returning all the frequent patterns, the proposed paradigm sends back a small set of randomly selected patterns so that the confidentiality of the dataset can be maintained. Our solution also allows interactive sampling, so that the sampled patterns can fulfill the user's requirement effectively. We show experimental results from several real‐life datasets to validate the capability and usefulness of our solution. In particular, we show examples that by using our proposed solution, an eCommerce marketplace can allow pattern mining on user session data without disclosing the data to the public; such a mining paradigm can help the sellers in the marketplace, which eventually can boost the market's own revenue. © 2016 Wiley Periodicals, Inc. Statistical Analysis and Data Mining: The ASA Data Science Journal, 2016

show abstract

Approximating the number of frequent sets in dense data

Cited by 20 publications

References 24 publications

Direct local pattern sampling by efficient two-step random procedures

Direct local pattern sampling by efficient two-step random procedures

Sampling minimal frequent boolean (DNF) patterns

Interactive knowledge discovery from hidden data through sampling of frequent patterns

Contact Info

Product

Resources

About