R. Senthamil Selvi scite author profile

Summary Big data is the emerging trend in modern science that deals with datasets larger and more complex that cannot be dealt by the traditional data processing techniques. This seems to be the core of current technology and business. In practice, many criteria should be considered in the implementation of this technique. The way of the search space for finding potential subsets of features and prediction performance of classifiers are major important issue. To solve this issue, feature selection methods are introduced in the recent work. In the feature selection algorithm, Non‐deterministic Polynomial (NP) Hard, and searching the space has been becomes more difficult task. To solve this problem, this work provides a new approach toward feature selection based on Vertical Split Group FireFly (VSGFF) algorithm. FF algorithm gets its inspiration from social aspects of real fireflies. At the same time, VSGFF is proposed with the principle of multiple clusters to avoid privacy problem. Finally, Naïve Bayes (NB), K Nearest Neighbor (KNN), and Multi‐Layer Perceptron Neural Network (MLPNN) classification algorithms are proposed for big data classification. Experimental outcomes depicts that proposed technique improves classification accuracy by 4% compared to traditional vertical split firefly algorithm.

show abstract

Ensemble classifier based big data classification with hybrid optimal feature selection

Pamila¹,

Selvi²,

Santhi³

et al. 2022

Advances in Engineering Software

View full text Add to dashboard Cite

Improved meta‐heuristic algorithm for selecting optimal features: A big data classification model

Selvi¹,

Valarmathi²,

Devadas

2022

Concurrency and Computation

View full text Add to dashboard Cite

Many fields function with large databases constitute a high number of features. Feature selection strategies seek to exclude the features that are distracting, repetitive, or unnecessary, as they can degrade the classification results. Existing approaches lack the scalability needed to handle the datasets with millions of instances and they do not obtain favorable results in a timely manner. This study uses a unique feature selection approach based on an upgraded optimization model and deep machine learning‐based data classification. “(a) Feature extraction, (b) optimal feature selection, and (c) classification” are the three stages of the proposed model. Initially, the extracted big‐datasets are efficiently handled by the parallel pool map‐reduce architecture. Several features from the input big‐data are extracted using feature extraction (FE) approaches such as the suggested Tri‐Kernel principal component analysis (TK‐PCA), linear discriminant analysis, and linear square regression. Furthermore, the data obtained characteristics may contain data that is irrelevant, out‐of‐date, or noisy. The computing cost rises due to the larger feature space. As a result, the best features are selected using a new optimization technique known as Levy Adapted SLnO (LA‐SLnO), which is a superior variant of the original SLnO algorithm. This selection of appropriate features improves the classification accuracy. For classification, Convolutional Neural Network is used in this work. Finally, a comparative evaluation is undergone to validate the efficiency of the proposed model.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.