Searching frequent patterns in transactional databases is considered as one of the most important data mining problems and Apriori is one of the typical algorithms for this task. Developing fast and efficient algorithms that can handle large volumes of data becomes a challenging task due to the large databases. In this paper, we implement a parallel Apriori algorithm based on MapReduce, which is a framework for processing huge datasets on certain kinds of distributable problems using a large number of computers (nodes). The experimental results demonstrate that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.
To improve the ability of the sine cosine algorithm (SCA) in the exploitation process, an improved symmetric SCA with adaptive probability selection (SSCA-APS), is proposed. The search process of this algorithm is divided into early and late stages. In the early stage, the operators of the traditional SCA algorithm continue to be used. In the late stage, three improvements were applied. Firstly, the symmetric sine and cosine operators are proposed. The adaptive probability selection strategy is adopted to integrate original sine and cosine operators and symmetric sine and cosine operators for dynamically adjusting the step size of the search range. Furthermore, to prevent the population from falling into local optimization, Gaussian perturbation is used to mutate the globally optimal individuals of the current generation. In addition, the information of two randomly selected individuals and the globally optimal individual is integrated by quadratic interpolation to maintain population diversity and produce a new individual. 23 test functions were used to verify the performance of the proposed algorithm. The simulation results indicate that the performance of the SSCA-APS algorithm has competitiveness when it is compared with classical SCA and some state-of-the-art SCA variants.
Batch process data are time‐varying dynamic and non‐Gaussian distributed. In addition, for multivariate statistical process monitoring, their variability can be overwhelmed when considering local variability behavior. To address the abovementioned issues, an improved batch process monitoring approach is presented that integrates just‐in‐time learning and multiple subspace support vector data description (JITL‐MSSVDD). A new multiple subspace segmentation method is proposed that classifies a contribution array that is calculated on the mixing matrix of independent component analysis (ICA). Offline, the variable subspace segmentation rule can be obtained from the proposed method. The subspace monitoring models can reduce the risk of the variability being overwhelmed. Online, local modeling samples are collected through JITL, which can reduce the impact of the time‐varying dynamic on modeling accuracy. Then, in accordance with the variable subspace segmentation rule, which is obtained offline, MSSVDD models are constructed to solve non‐Gaussian problems. The advantage of the proposed JITL‐MSSVDD is demonstrated through the standard test model fed‐batch penicillin fermentation process.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.