International audienceClustering is one of the major tasks in data mining. However, selecting an algorithm to cluster a dataset is a difficult task, especially if there is no prior knowledge on the structure of the data. Consensus clustering methods can be used to combine multiple base clusterings into a new solution that provides better partitioning. In this work, we present a new consensus clustering method based on detecting clustering patterns by mining frequent closed itemset. Instead of generating one consensus, this method both generates multiple consensuses based on varying the number of base clusterings, and links these solutions in a hierarchical representation that eases the selection of the best clustering. This hierarchical view also provides an analysis tool, for example to discover strong clusters or outlier instances
Clustering is the process of partitioning a dataset into groups based on the similarity between the instances. Many clustering algorithms were proposed, but none of them proved to provide good quality partition in all situations. Consensus clustering aims to enhance the clustering process by combining different partitions obtained from different algorithms to yield a better quality consensus solution. In this work, we propose a new consensus clustering method that uses a pattern mining technique in order to reduce the search space from instance-based into pattern-based space. Instead of finding one solution, our method generates multiple consensus candidates based on varying the number of base clusterings considered. The different solutions are then linked and presented as a tree that gives more insight about the similarities between the instances and the different partitions in the ensemble.
Clustering is the process of partitioning a dataset into groups based on the similarity between the instances. Many clustering algorithms were proposed, but none of them proved to provide good quality partition in all situations. Consensus clustering aims to enhance the clustering process by combining different partitions obtained from different algorithms to yield a better quality consensus solution. In this work, we propose a new consensus method that uses a pattern mining technique in order to reduce the search space from instance-based into pattern-based space. Instead of finding one solution, our method generates multiple consensus candidates based on varying the number of base clusterings considered. The different solutions are then linked and presented as a tree that gives more insight about the similarities between the instances and the different partitions in the ensemble.
Abstract. Discovering high-utility temsets in transaction databases is a popular data mining task. A limitation of traditional algorithms is that a huge amount of high-utility itemsets may be presented to the user. To provide a concise and lossless representation of results to the user, the concept of closed high-utility itemsets was proposed. However, mining closed high-utility itemsets is computationally expensive. To address this issue, we present a novel algorithm for discovering closed high-utility itemsets, named EFIM-Closed. This algorithm includes novel pruning strategies named closure jumping, forward closure checking and backward closure checking to prune non-closed high-utility itemsets. Furthermore, it also introduces novel utility upper-bounds and a transaction merging mechanism. Experimental results shows that EFIM-Closed can be more than an order of magnitude faster and consumes more than an order of magnitude less memory than the previous state-of-art CHUD algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.