A Fast Non-Redundant Feature Selection Technique for Text Data

Hussain, Syed Fawad; Babar, Hafiz Zaheer-Ud-Din; Khalil, Akhtar; Jillani, Rashad; Hanif, Muhammad Abdullah; Khurshid, Khurram

doi:10.1109/access.2020.3028469

Cited by 21 publications

(7 citation statements)

References 56 publications

(76 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The within-class scatter matrix Wid class and between-class scatter matrix Bet class is being used to overcome the problems associated with the ideal discrimination projection matrix. Furthermore, Equation (11) depicts the mathematical equation used to calculate the projection matrix. In addition, the formulas for evaluating Bet class and Wid class are represented by the mathematical equations Equation (12) and Equation (13), respectively.…”

Section: Sigmoid Kernelmentioning

confidence: 99%

“…In addition, the formulas for evaluating Bet class and Wid class are represented by the mathematical equations Equation (12) and Equation (13), respectively. The eigenvectors of s are shown in Equation (11). The projection matrix is denoted by the symbol S and the eigenvectors of S are shown in Equation ( 14), in which G T = Bet class + Wid class ,, F J is the feature vector of the data, 𝛼 N and N is the data vector and samples in the data class J.…”

Section: Sigmoid Kernelmentioning

confidence: 99%

“…Recently, with the adverse growth of big data in all science and engineering domains especially in physical, biological, and biomedical sciences, there is faster development of networking, data collection, and data storage capacity. [1][2][3][4][5][6][7][8][9][10][11] Data mining refers to be the activity of searching relevant or pertinent information by processing the big datasets. This has greatly influenced in the decision-making process.…”

Section: Introductionmentioning

confidence: 99%

“…As a result, there should be some system in place to help organize unstructured data allowing the user to quickly access the information they require. Recently, with the adverse growth of big data in all science and engineering domains especially in physical, biological, and biomedical sciences, there is faster development of networking, data collection, and data storage capacity 1–11 . Data mining refers to be the activity of searching relevant or pertinent information by processing the big datasets.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Improved meta‐heuristic algorithm for selecting optimal features: A big data classification model

Selvi¹,

Valarmathi²,

Devadas

2022

Concurrency and Computation

View full text Add to dashboard Cite

Many fields function with large databases constitute a high number of features. Feature selection strategies seek to exclude the features that are distracting, repetitive, or unnecessary, as they can degrade the classification results. Existing approaches lack the scalability needed to handle the datasets with millions of instances and they do not obtain favorable results in a timely manner. This study uses a unique feature selection approach based on an upgraded optimization model and deep machine learning‐based data classification. “(a) Feature extraction, (b) optimal feature selection, and (c) classification” are the three stages of the proposed model. Initially, the extracted big‐datasets are efficiently handled by the parallel pool map‐reduce architecture. Several features from the input big‐data are extracted using feature extraction (FE) approaches such as the suggested Tri‐Kernel principal component analysis (TK‐PCA), linear discriminant analysis, and linear square regression. Furthermore, the data obtained characteristics may contain data that is irrelevant, out‐of‐date, or noisy. The computing cost rises due to the larger feature space. As a result, the best features are selected using a new optimization technique known as Levy Adapted SLnO (LA‐SLnO), which is a superior variant of the original SLnO algorithm. This selection of appropriate features improves the classification accuracy. For classification, Convolutional Neural Network is used in this work. Finally, a comparative evaluation is undergone to validate the efficiency of the proposed model.

show abstract

Section: Sigmoid Kernelmentioning

confidence: 99%

Section: Sigmoid Kernelmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Improved meta‐heuristic algorithm for selecting optimal features: A big data classification model

Selvi¹,

Valarmathi²,

Devadas

2022

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…This is particularly true for sparse data. Feature selection is a commonly used approach to select only relevant (discriminatory) features [8,9] which help in increasing the effectiveness of an algorithm as well as reducing its complexity. This is essential since all features are not equally important and irrelevant features negatively impact the clusters.…”

Section: Introductionmentioning

confidence: 99%

Weighted multi-view co-clustering (WMVCC) for sparse data

2021

Self Cite

View full text Add to dashboard Cite

Multi-view clustering has gained importance in recent times due to the large-scale generation of data, often from multiple sources. Multi-view clustering refers to clustering a set of objects which are expressed by multiple set of features, known as views, such as movies being expressed by the list of actors or by a textual summary of its plot. Co-clustering, on the other hand, refers to the simultaneous grouping of data samples and features under the assumption that samples exhibit a pattern only under a subset of features. This paper combines multi-view clustering with co-clustering and proposes a new Weighted Multi-View Co-Clustering (WMVCC) algorithm. The motivation behind the approach is to use the diversity of features provided by multiple sources of information while exploiting the power of co-clustering. The proposed method expands the clustering objective function to a unified co-clustering objective function across all the multiple views. The algorithm follows the k-means strategy and iteratively optimizes the clustering by updating cluster labels, features, and view weights. A local search is also employed to optimize the clustering result using weighted multi-step paths in a graph. Experiments are conducted on several benchmark datasets. The results show that the proposed approach converges quickly, and the clustering performance significantly outperforms other recent and state-of-the-art algorithms on sparse datasets.

show abstract

Effective Feature Selection for Improved Prediction of Heart Disease

Mienye

Sun

2022

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

View full text Add to dashboard Cite

A Fast Non-Redundant Feature Selection Technique for Text Data

Cited by 21 publications

References 56 publications

Improved meta‐heuristic algorithm for selecting optimal features: A big data classification model

Improved meta‐heuristic algorithm for selecting optimal features: A big data classification model

Weighted multi-view co-clustering (WMVCC) for sparse data

Effective Feature Selection for Improved Prediction of Heart Disease

Contact Info

Product

Resources

About