Composite Feature Extraction and Selection for Text Classification

Wan, Chuan; Wang, Yuling; Liu, Yaoze; Ji, Jinchao; Feng, Guozhong

doi:10.1109/access.2019.2904602

Cited by 30 publications

(14 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where the last inequality follows from (7). The above inequality implies that for a fixed λ t , ϕ λ t {W k } is non-increasing and moreover, Since f (W) is bounded below, it then follows that ϕ λ t {W k } is bounded below.…”

Section: Convergence Analysismentioning

confidence: 91%

“…Feature selection has become an essential component in data mining and machine learning because it can reduce the feature size, enhance data understanding, alleviate the effect of the curse of dimensionality, speed up the learning process and improve model's performance. Therefore, it has been widely used in many real-world applications, e.g., text mining [6], [7], pattern recognition [3], and bioinformatics [8], [9].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Robust Multi-class Feature Selection via $l_{2,0}$-Norm Regularization Minimization

Sun¹,

Yu²

2020

Preprint

View full text Add to dashboard Cite

Feature selection is an important data preprocessing in data mining and machine learning, which can reduce feature size without deteriorating model's performance. Recently, sparse regression based feature selection methods have received considerable attention due to their good performance. However, these methods generally cannot determine the number of selected features automatically without using a predefined threshold. In order to get a satisfactory result, it often costs significant time and effort to tune the number of selected features carefully. To this end, this paper proposed a novel framework to solve the l 2,0 -norm regularization least square problem directly for multi-class feature selection, which can produce exact rowsparsity solution for the weights matrix, features corresponding to non-zero rows will be selected thus the number of selected features can be determined automatically. An efficient homotopy iterative hard threshold (HIHT) algorithm is derived to solve the above optimization problem and find out the stable local solution. Besides, in order to reduce the computational time of HIHT, an acceleration version of HIHT (AHIHT) is derived. Extensive experiments on eight biological datasets show that the proposed method can achieve higher classification accuracy with fewest number of selected features comparing with the approximate convex counterparts and state-of-the-art feature selection methods. The robustness of classification accuracy to the regularization parameter is also exhibited.

show abstract

Section: Convergence Analysismentioning

confidence: 91%

Section: Introductionmentioning

confidence: 99%

Robust Multi-class Feature Selection via $l_{2,0}$-Norm Regularization Minimization

Sun¹,

Yu²

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…F EATURE selection is a process of selecting a subset of features which are most relevant and informative. Feature selection has been widely researched for many years [1]- [5], and used in many real-world applications, e.g., pattern recognition [3], text mining [6], [7], and bioinformatics [8], [9]. Depending on the existing of ground truth, feature selection can be classified into three categories: supervised, semisupervised, and unsupervised.…”

Section: Introductionmentioning

confidence: 99%

Joint Adaptive Graph and Structured Sparsity Regularization for Unsupervised Feature Selection

Sun¹,

Yu²

2020

Preprint

View full text Add to dashboard Cite

Feature selection is used to reduce feature dimension while maintain model's performance, which has been an important data preprocessing in many fields. Since obtaining annotated data is laborious or even infeasible in many cases, unsupervised feature selection is more practical in reality. Although a lots of methods have been proposed, these methods select features independently, thus it is no guarantee that the group of selected features is optimal. What's more, the number of selected features must be tuned carefully to get a satisfactory result. In this paper, we propose a novel unsupervised feature selection method which incorporate spectral analysis with a l 2,0 -norm regularized term. After optimization, a group of optimal features will be selected, and the number of selected features will be determined automatically. What's more, a nonnegative constraint with respect to the class indicators is imposed to learn more accurate cluster labels, and a graph regularized term is added to learn the similarity matrix adaptively. An efficient and simple iterative algorithm is designed to optimize the proposed problem. Experiments on six different benchmark data sets validate the effectiveness of the proposed approach.

show abstract

“…reduce their dimensionality. Two types of dimensional reduction techniques are distinguished [6], [36]: feature extraction and feature selection. Feature extraction methods [7], [43] transform the original variable space to perform dimensional reduction.…”

Section: Introductionmentioning

confidence: 99%

KSUFS: A Novel Unsupervised Feature Selection Method Based on Statistical Tests for Standard and Big Data Problems

Sáez

Corchado

2019

IEEE Access

View full text Add to dashboard Cite

The typical inaccuracy of data gathering and preparation procedures makes erroneous and unnecessary information to be a common issue in real-world applications. In this context, feature selection methods are used in order to reduce the harmful impact of such information in data analysis by removing irrelevant features from datasets. This research presents a novel feature selection method in the field of unsupervised learning, in which the complexity arises from the fact that the class labels cannot be used to select the most discriminative features as it is traditionally performed in supervised learning. The technique designed, which is called Kolmogorov-Smirnov test-based Unsupervised Feature Selection (KSUFS), is based on the computation of estimated feature distributions that are later compared to the original ones using non-parametric statistical tests to provide the most representative input variables. Two versions of the KSUFS are presented in this study: one of them is particularly designed to deal with standard data, in which the accuracy of the method prevalences over other of its aspects; the other version is designed to treat with big data problems, in which the computational complexity is improved due to the characteristics of this type of datasets. The KSUFS is successfully compared to other state-of-the-art unsupervised feature selection techniques in a thorough experimental study, which considers both standard and big data problems. The results obtained show that the method proposed is able to outperform the rest of reference unsupervised feature selection methods considered in the comparisons, selecting the first most influential features for standard datasets and particularly highlighting when big data problems are treated.

show abstract

Composite Feature Extraction and Selection for Text Classification

Cited by 30 publications

References 33 publications

Robust Multi-class Feature Selection via $l_{2,0}$-Norm Regularization Minimization

Robust Multi-class Feature Selection via $l_{2,0}$-Norm Regularization Minimization

Joint Adaptive Graph and Structured Sparsity Regularization for Unsupervised Feature Selection

KSUFS: A Novel Unsupervised Feature Selection Method Based on Statistical Tests for Standard and Big Data Problems

Contact Info

Product

Resources

About