Summary
Big data is the emerging trend in modern science that deals with datasets larger and more complex that cannot be dealt by the traditional data processing techniques. This seems to be the core of current technology and business. In practice, many criteria should be considered in the implementation of this technique. The way of the search space for finding potential subsets of features and prediction performance of classifiers are major important issue. To solve this issue, feature selection methods are introduced in the recent work. In the feature selection algorithm, Non‐deterministic Polynomial (NP) Hard, and searching the space has been becomes more difficult task. To solve this problem, this work provides a new approach toward feature selection based on Vertical Split Group FireFly (VSGFF) algorithm. FF algorithm gets its inspiration from social aspects of real fireflies. At the same time, VSGFF is proposed with the principle of multiple clusters to avoid privacy problem. Finally, Naïve Bayes (NB), K Nearest Neighbor (KNN), and Multi‐Layer Perceptron Neural Network (MLPNN) classification algorithms are proposed for big data classification. Experimental outcomes depicts that proposed technique improves classification accuracy by 4% compared to traditional vertical split firefly algorithm.
Many fields function with large databases constitute a high number of features. Feature selection strategies seek to exclude the features that are distracting, repetitive, or unnecessary, as they can degrade the classification results. Existing approaches lack the scalability needed to handle the datasets with millions of instances and they do not obtain favorable results in a timely manner. This study uses a unique feature selection approach based on an upgraded optimization model and deep machine learning‐based data classification. “(a) Feature extraction, (b) optimal feature selection, and (c) classification” are the three stages of the proposed model. Initially, the extracted big‐datasets are efficiently handled by the parallel pool map‐reduce architecture. Several features from the input big‐data are extracted using feature extraction (FE) approaches such as the suggested Tri‐Kernel principal component analysis (TK‐PCA), linear discriminant analysis, and linear square regression. Furthermore, the data obtained characteristics may contain data that is irrelevant, out‐of‐date, or noisy. The computing cost rises due to the larger feature space. As a result, the best features are selected using a new optimization technique known as Levy Adapted SLnO (LA‐SLnO), which is a superior variant of the original SLnO algorithm. This selection of appropriate features improves the classification accuracy. For classification, Convolutional Neural Network is used in this work. Finally, a comparative evaluation is undergone to validate the efficiency of the proposed model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.