Abstract. High-utility itemset mining (HUIM) is an important data mining task with wide applications. In this paper, we propose a novel algorithm named EFIM (EFficient high-utility Itemset Mining), which introduces several new ideas to more efficiently discovers high-utility itemsets both in terms of execution time and memory. EFIM relies on two upper-bounds named sub-tree utility and local utility to more effectively prune the search space. It also introduces a novel array-based utility counting technique named Fast Utility Counting to calculate these upper-bounds in linear time and space. Moreover, to reduce the cost of database scans, EFIM proposes efficient database projection and transaction merging techniques. An extensive experimental study on various datasets shows that EFIM is in general two to three orders of magnitude faster and consumes up to eight times less memory than the state-of-art algorithms d2 HUP, HUI-Miner, HUP-Miner, FHM and UP-Growth+.
Sequential rule mining is an important data mining task with wide applications. The current state-of-the-art algorithm (RuleGrowth) for this task relies on a pattern-growth approach to discover sequential rules. A drawback of this approach is that it repeatedly performs a costly database projection operation, which deteriorates performance for datasets containing dense or long sequences. In this paper, we address this issue by proposing an algorithm named ERMiner (Equivalence class based sequential Rule Miner) for mining sequential rules. It relies on the novel idea of searching using equivalence classes of rules having the same antecedent or consequent. Furthermore, it includes a data structure named SCM (Sparse Count Matrix) to prune the search space. An extensive experimental study with five real-life datasets shows that ERMiner is up to five times faster than RuleGrowth but consumes more memory.
High utility itemset (HUI) mining is a popular data mining task, which consists of discovering sets of items generating high profit in a transaction database. Recently, several efficient algorithms have been proposed for this task. But, most of them do not consider the on-shelf time periods of items, which thus lead to a bias toward items having more shelf time. Moreover, most algorithms cannot handle databases containing items with a negative unit profit, although this case is very common in real transaction databases. In this paper, we address both of these challenges by proposing a novel efficient algorithm named FOSHU (Faster On-Shelf High Utility itemset miner) to mine HUIs while considering on-shelf time periods of items, and items having positive and/or negative unit profit. An extensive experimental study with real-life datasets shows that the proposed algorithm can be up more than 1000 times faster and use up to 10 times less memory than the state-of-the-art algorithm TS-HOUN for this task. Moreover, experiments show that the proposed algorithm performs well on dense database and databases containing many time periods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.