Data mining techniques are useful in discovering hidden knowledge from large databases. One of its common techniques is sequential rule mining. A sequential rule (SR) helps in finding all sequential rules that achieved support and confidence threshold for help in prediction. It is an alternative to sequential pattern mining in that it takes the probability of the following patterns into account. In this paper, we address the preferable utilization of sequential rule mining algorithms by applying them to databases with different features for improving the efficiency in different fields of application. The three compared algorithms are the TRuleGrowth algorithm, which is an extension sequential rule algorithm of RuleGrowth; the top-k non-redundant sequential rules algorithm (TNS); and a non-redundant dynamic bit vector (NRD-DBV). The analysis compares the three algorithms regarding the run time, the number of produced rules, and the used memory to nominate which of them is best suited in prediction. Additionally, it explores the most suitable applications for each algorithm to improve the efficiency. The experimental results proved that the performance of the algorithms appears related to the dataset characteristics. It has been demonstrated that altering the window size constraint, determining the number of created rules, or changing the value of the minSup threshold can reduce execution time and control the number of valid rules generated.
Sequential rule mining is one of the most common data mining techniques. It intends to find desired rules in large sequence databases. It can decide the essential information that helps acquire knowledge from large search spaces and select curiously rules from sequence databases. The key challenge is to avoid wasting time, which is particularly difficult in large sequence databases. This paper studies the mining rules from two representations of sequential patterns to have compact databases without affecting the final result. In addition, execute a parallel approach by utilizing multi core processor architecture for mining non-redundant sequential rules. Also, perform pruning techniques to enhance the efficiency of the generated rules. The evaluation of the proposed algorithm was accomplished by comparing it with another non-redundant sequential rule algorithm called Non-Redundant with Dynamic Bit Vector (NRD-DBV). Both algorithms were performed on four real datasets with different characteristics. Our experiments show the performance of the proposed algorithm in terms of execution time and computational cost. It achieves the highest efficiency, especially for large datasets and with low values of minimum support, as it takes approximately half the time consumed by the compared algorithm.
The COVID-19 (Coronavirus) is a catastrophic disease, as it causes a global health crisis. Due to the nature of COVID-19, it spreads quickly among humans and infects millions of people within a few periods in the world. It is critical to detect the behaviour of COVID-19 and the speed of its mutating rapidly for better improvement of medications and assists patients in preventing the progression of the disease. This paper examines the discovery of additional information and interest patterns in COVID-19 genome sequences. An enhanced non-redundant sequential rule algorithm is mined from frequent closed dynamic bit vector and sequential generator patterns simultaneously. It speedily discovers nucleotide rules and predicts the next one after eliminating un-candidates' sequential patterns early. Almost all genotyping tests are partial, time-consuming, and involve multi-step processes. So, an efficient parallel approach is implemented by utilizing multicore processor architecture to produce the sequential rules in less time required. The experimental results show that; the proposed Parallel Non-Redundant Dynamic closed generator (PNRD-CloGen) algorithm performs well in terms of execution time, computational cost, and scalability. It has better performance, especially for large datasets and low minimum support values, as it takes around half the time as the competing algorithm. So, it helps to monitor the strain progression of COVID-19 sequentially and enhance clinical management.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.