2009
DOI: 10.1007/s10115-009-0212-4
|View full text |Cite
|
Sign up to set email alerts
|

Approximating the number of frequent sets in dense data

Abstract: We investigate the problem of counting the number of frequent (item)sets-a problem known to be intractable in terms of an exact polynomial time computation. In this paper, we show that it is in general also hard to approximate. Subsequently, a randomized counting algorithm is developed using the Markov chain Monte Carlo method. While for general inputs an exponential running time is needed in order to guarantee a certain approximation bound, we show that the algorithm still has the desired accuracy on several … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
17
0

Year Published

2010
2010
2022
2022

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 20 publications
(17 citation statements)
references
References 24 publications
0
17
0
Order By: Relevance
“…The worst-case convergence can, however, be exponentially slow in the size of the input database. For sampling from the family of frequent patterns, this problem appears to be inherent: almost uniform frequent pattern sampling can be used for approximate frequent pattern counting, which one can show to be intractable under reasonable complexity assumptions (see [7]). Similar conclusions can be drawn for enumeration spaces defined by linearly scaled versions of the frequency measure such as the standard optimistic estimator for the binomial test quality function in subgroup discovery [27].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The worst-case convergence can, however, be exponentially slow in the size of the input database. For sampling from the family of frequent patterns, this problem appears to be inherent: almost uniform frequent pattern sampling can be used for approximate frequent pattern counting, which one can show to be intractable under reasonable complexity assumptions (see [7]). Similar conclusions can be drawn for enumeration spaces defined by linearly scaled versions of the frequency measure such as the standard optimistic estimator for the binomial test quality function in subgroup discovery [27].…”
Section: Related Workmentioning
confidence: 99%
“…Boley and Grosskreutz [7] proposes frequent set sampling to approximate the effect of specific minimum frequency thresholds. The proposed algorithm simulates a simple Glauber dynamic on the frequent set lattice: starting with the empty set, in each subsequent time step a single item is either removed or added to the current set.…”
Section: Related Workmentioning
confidence: 99%
“…In [2], the authors introduced a generic sampling framework to sample the output space of frequent subgraphs, which is based on MCMC algorithm as well. In the context of itemset mining, [5] proposed a randomized approximation method for counting the number of frequent itemsets. In [4] a MetropolisHastings algorithm for sampling closed itemsets is given.…”
Section: Related Workmentioning
confidence: 99%
“…Our solution for frequent pattern mining from hidden data utilizes sampling of patterns from the frequent pattern space. In the literature, there exist several pattern sampling algorithms , but they do not fulfill the purpose of our task. The major difference of these algorithms with ours is that the sampling distribution of our method changes through user interaction, whereas in the above‐cited works the sampling distribution remains fixed throughout the sampling session.…”
Section: Introductionmentioning
confidence: 99%