A false negative approach to mining frequent itemsets from high speed transactional data streams

Yu, Jeffrey Xu; Chong, Zhihong; Lü, Hongjun; Zhang, Zhenjie; Zhou, Aoying

doi:10.1016/j.ins.2005.11.003

Cited by 79 publications

(52 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The benchmark datasets used in our experiments may not be representative of the particular type of data sets where users want to find the maximum length frequent itemsets. Also, other requirements may be added in the mining process for the longest patterns, such as those in correlation [21], data stream [22] and temporal pattern [13] mining. Finally, since LFI has a potential to be an interesting pattern to preserve during clustering, another direction is to exploit LFI for transaction clustering.…”

Section: Discussionmentioning

confidence: 99%

Discovery of maximum length frequent itemsets

Sung

Xiong

et al. 2008

Information Sciences

View full text Add to dashboard Cite

The use of frequent itemsets has been limited by the high computational cost as well as the large number of resulting itemsets. In many real-world scenarios, however, it is often sufficient to mine a small representative subset of frequent itemsets with low computational cost. To that end, in this paper, we define a new problem of finding the frequent itemsets with a maximum length and present a novel algorithm to solve this problem. Indeed, maximum length frequent itemsets can be efficiently identified in very large data sets and are useful in many application domains. Our algorithm generates the maximum length frequent itemsets by adapting a pattern fragment growth methodology based on the FP-tree structure. Also, a number of optimization techniques have been exploited to prune the search space. Finally, extensive experiments on real-world data sets validate the proposed algorithm.

show abstract

Section: Discussionmentioning

confidence: 99%

Discovery of maximum length frequent itemsets

Sung

Xiong

et al. 2008

Information Sciences

View full text Add to dashboard Cite

show abstract

“…Algorithms for random streams: Yu et al [16] presented another algorithm for transaction stream mining. The main idea in their approach is to keep a list of potentially frequent itemsets, and to update the list in a clever way when advancing the stream.…”

Section: Related Workmentioning

confidence: 99%

Frequent Pairs in Data Streams: Exploiting Parallelism and Skew

Campagna

Kutzkov

Pagh

2011

2011 IEEE 11th International Conference on Data Mining Workshops

View full text Add to dashboard Cite

Abstract-We introduce the Pair Streaming Engine (PairSE) that detects frequent pairs in a data stream of transactions. Our algorithm finds the most frequent pairs with high probability, and gives tight bounds on their frequency. It is particularly space efficient for skewed distribution of pair supports, confirmed for several real-world datasets. Additionally, the algorithm parallelizes easily, which opens up for real-time processing of large transactions. Unlike previous algorithms we make no assumptions on the order of arrival of transactions and pairs.Our algorithm builds upon approaches for frequent items mining in data streams. We show how to efficiently scale these approaches to handle large transactions.We report experimental results showcasing precision and recall of our method. In particular, we find that often our method achieves excellent precision, returning identical upper and lower bounds on the supports of the most frequent pairs.

show abstract

“…A false negative approach: Yu et al [11] present algorithms directly addressing the problem of finding frequent itemsets in a transaction stream. The algorithm does not find itemsets that are similar by means of measure functions other than support.…”

Section: A Previous Workmentioning

confidence: 99%

On Finding Similar Items in a Stream of Transactions

Campagna

Pagh

2010

2010 IEEE International Conference on Data Mining Workshops

View full text Add to dashboard Cite

Abstract-While there has been a lot of work on finding frequent itemsets in transaction data streams, none of these solve the problem of finding similar pairs according to standard similarity measures. This paper is a first attempt at dealing with this, arguably more important, problem.We start out with a negative result that also explains the lack of theoretical upper bounds on the space usage of data mining algorithms for finding frequent itemsets: Any algorithm that (even only approximately and with a chance of error) finds the most frequent k-itemset must use space Ω(min{mb, n k , (mb/ϕ) k }) bits, where mb is the number of items in the stream so far, n is the number of distinct items and ϕ is a support threshold.To achieve any non-trivial space upper bound we must thus abandon a worst-case assumption on the data stream. We work under the model that the transactions come in random order, and show that surprisingly, not only is small-space similarity mining possible for the most common similarity measures, but the mining accuracy improves with the length of the stream for any fixed support threshold.

show abstract

A false negative approach to mining frequent itemsets from high speed transactional data streams

Cited by 79 publications

References 16 publications

Discovery of maximum length frequent itemsets

Discovery of maximum length frequent itemsets

Frequent Pairs in Data Streams: Exploiting Parallelism and Skew

On Finding Similar Items in a Stream of Transactions

Contact Info

Product

Resources

About