Periodic high-utility sequential pattern mining (PHUSPM) is used to extract periodically occurring high-utility sequential patterns (HUSPs) from a quantitative sequence database according to a user-specified minimum utility threshold (minutil). A sequential pattern’s periodicity is determined by measuring when the frequency of its periods (the time between two consecutive happenings of the sequential pattern) exceed a user-specified maximum periodicity threshold (maxPer). However, due to the strict judgment threshold, the traditional PHUSPM method has the problem that some useful sequential patterns are discarded and the periodic values of some sequential patterns fluctuate greatly (i.e., are unstable). In frequent itemset mining (FIM), some researchers put forward some strategies to solve these problems. Because of the symmetry of frequent itemset pattern (FIPs), these strategies cannot be directly applied to PHUSPM. In order to address these issues, this work proposes the stable periodic high-utility sequential pattern mining (SPHUSPM) algorithm. The contributions made by this paper are as follows. First, we introduce the concept of stability to overcome the abovementioned problems, mine sequential patterns with stable periodic behavior, and propose the concept of stable periodic high-utility sequential patterns (SPHUSPs) for the first time. Secondly, we design a new data structure named the PUL-list to record the periodic information of sequential patterns, thereby improving the mining efficiency. Thirdly, we propose the maximum lability pruning strategy in sequential pattern (MLPS), which can prune a large number of unstable sequential patterns in advance. To assess the algorithm’s effectiveness, we perform many experiments. It turns out that the algorithm can not only mine patterns that are ignored by traditional algorithms, but also ensure that the discovered patterns have stable periodic behavior. In addition, after using the MLPS pruning strategy, the algorithm can prune 46.5% of candidates in advance on average in six datasets. Pruning a large number of candidates in advance not only speeds up the mining process, but also greatly reduces memory usage.
High utility sequential pattern (HUSP) mining aims to mine actionable patterns with high utilities, widely applied in real-world learning scenarios such as market basket analysis, scenic route planning and click-stream analysis. The existing HUSP mining algorithms mainly attempt to improve computation efficiency while maintaining the algorithm stability in the setting of large-scale data. Although these methods have made some progress, they ignore the relationship between additional items and underlying sequences, which directly leads to the generation of redundant sequential patterns sharing the same underlying sequence. Hence, the mined patterns’ actionability is limited, which significantly compromises the performance of patterns in real-world applications. To address this problem, we present a new method named Combined Utility-Association Sequential Pattern Mining (CUASPM) by incorporating item/sequence relations, which can effectively remove redundant patterns and extract high discriminative and strongly associated sequential pattern combinations with high utilities. Specifically, we introduce the concept of actionable combined mining into HUSP mining for the first time and develop a novel tree structure to select discriminative high utility sequential patterns (HUSPs) for downstream tasks. Furthermore, two efficient strategies (i.e., global and local strategies) are presented to facilitate mining HUSPs while guaranteeing utility growth and high levels of association. Last, two parameters are introduced to evaluate the interestingness of patterns to choose the most useful actionable combined HUSPs (ACHUSPs). Extensive experimental results demonstrate that the proposed CUASPM outperforms the baselines in terms of execution time, memory usage, mining high discriminative and strongly associated HUSPs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.