2021
DOI: 10.3233/ida-205154
|View full text |Cite
|
Sign up to set email alerts
|

Efficient n-gram construction for text categorization using feature selection techniques

Abstract: In this paper, we present a novel approach for n-gram generation in text classification. The a-priori algorithm is adapted to prune word sequences by combining three feature selection techniques. Unlike the traditional two-step approach for text classification in which feature selection is performed after the n-gram construction process, our proposal performs an embedded feature elimination during the application of the a-priori algorithm. The proposed strategy reduces the number of branches to be explored, sp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 15 publications
(5 citation statements)
references
References 43 publications
0
5
0
Order By: Relevance
“…Text classification methods mainly include word matching method, knowledge engineering, and statistical learning [16], of which word matching method is the earliest proposed method. e classification processing principle of word matching method is to classify according to the class name in the document.…”
Section: Feature Extraction Methods and Text Classificationmentioning
confidence: 99%
“…Text classification methods mainly include word matching method, knowledge engineering, and statistical learning [16], of which word matching method is the earliest proposed method. e classification processing principle of word matching method is to classify according to the class name in the document.…”
Section: Feature Extraction Methods and Text Classificationmentioning
confidence: 99%
“…Text is an unstructured or semistructured form of data organization, which means that computers are unable to directly process it [14]. Text consists of a collection of characters that have been arranged in a certain order.…”
Section: Text Representationmentioning
confidence: 99%
“…In [16], the authors proposed a method to detect spam comments on YouTube by using different machine learning algorithms with the n-gram approach, and they proved that this technique is effective in detecting spam comments. García et al [17] introduced a system for text classification that executes embedded feature elimination via an a priori algorithm. The aim of their study was to speed up the word sequence constructions by minimizing the explored branches' number as much as possible.…”
Section: Stance Detectionmentioning
confidence: 99%