Mining high utility itemsets from databases is an emerging topic in data mining, which refers to the discovery of itemsets with utilities higher than a user-specified minimum utility threshold min_util. Although several studies have been carried out on this topic, setting an appropriate minimum utility threshold is a difficult problem for users. If min_util is set too low, too many high utility itemsets will be generated, which may cause the mining algorithms to become inefficient or even run out of memory. On the other hand, if min_util is set too high, no high utility itemset will be found. Setting appropriate minimum utility thresholds by trial and error is a tedious process for users. In this paper, we address this problem by proposing a new framework named top-k high utility itemset mining, where k is the desired number of high utility itemsets to be mined. An efficient algorithm named TKU (Top-K Utility itemsets mining) is proposed for mining such itemsets without setting min_util. Several features were designed in TKU to solve the new challenges raised in this problem, like the absence of anti-monotone property and the requirement of lossless results. Moreover, TKU incorporates several novel strategies for pruning the search space to achieve high efficiency. Results on real and synthetic datasets show that TKU has excellent performance and scalability.
Data stream mining has become an emerging research topic in the data mining field, and finding frequent itemsets is an important task in data stream mining with wide applications. Recently, utility mining is receiving extensive attentions with two issues reconsidered: First, the utility (e.g., profit) of each item may be different in real applications; second, the frequent itemsets might not produce the highest utility. In this paper, we propose a novel algorithm named GUIDE (Generation of temporal maximal Utility Itemsets from Data strEams) which can find temporal maximal utility itemsets from data streams. A novel data structure, namely, TMUI-tree (Temporal Maximal Utility Itemset tree), is also proposed for efficiently capturing the utility of each itemset with one-time scanning. The main contributions of this paper are as follows: 1) GUIDE is the first one-pass utility-based algorithm for mining temporal maximal utility itemsets from data streams, and 2) TMUI-tree is efficient and easy to maintain. The experimental results show that our approach outperforms other existing utility mining algorithms like Two-Phase algorithm under the data stream environments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.