2019
DOI: 10.1109/access.2019.2904602
|View full text |Cite
|
Sign up to set email alerts
|

Composite Feature Extraction and Selection for Text Classification

Abstract: Although words are basic semantic units in text, phrases, and expressions contain additional information, which is important for text classification. To capture this information, traditional algorithms extract composite features via word sequences or co-occurrences, such as bigrams and termsets, but ignore the influence of stop words and punctuation, which results in huge amounts of weak features. In this paper, we propose a text structure-based algorithm to extract composite features. Termsets that cross punc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
14
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 30 publications
(14 citation statements)
references
References 33 publications
0
14
0
Order By: Relevance
“…where the last inequality follows from (7). The above inequality implies that for a fixed λ t , ϕ λ t {W k } is non-increasing and moreover, Since f (W) is bounded below, it then follows that ϕ λ t {W k } is bounded below.…”
Section: Convergence Analysismentioning
confidence: 91%
See 1 more Smart Citation
“…where the last inequality follows from (7). The above inequality implies that for a fixed λ t , ϕ λ t {W k } is non-increasing and moreover, Since f (W) is bounded below, it then follows that ϕ λ t {W k } is bounded below.…”
Section: Convergence Analysismentioning
confidence: 91%
“…Feature selection has become an essential component in data mining and machine learning because it can reduce the feature size, enhance data understanding, alleviate the effect of the curse of dimensionality, speed up the learning process and improve model's performance. Therefore, it has been widely used in many real-world applications, e.g., text mining [6], [7], pattern recognition [3], and bioinformatics [8], [9].…”
Section: Introductionmentioning
confidence: 99%
“…F EATURE selection is a process of selecting a subset of features which are most relevant and informative. Feature selection has been widely researched for many years [1]- [5], and used in many real-world applications, e.g., pattern recognition [3], text mining [6], [7], and bioinformatics [8], [9]. Depending on the existing of ground truth, feature selection can be classified into three categories: supervised, semisupervised, and unsupervised.…”
Section: Introductionmentioning
confidence: 99%
“…reduce their dimensionality. Two types of dimensional reduction techniques are distinguished [6], [36]: feature extraction and feature selection. Feature extraction methods [7], [43] transform the original variable space to perform dimensional reduction.…”
Section: Introductionmentioning
confidence: 99%