2016
DOI: 10.1016/j.ipm.2016.03.007
|View full text |Cite
|
Sign up to set email alerts
|

Helmholtz principle based supervised and unsupervised feature selection methods for text mining

Abstract: Tutkan, Melike (Dogus Author) -- Akyokuş, Selim (Dogus Author)One of the important problems in text classification is the high dimensionality of the feature space. Feature selection methods are used to reduce the dimensionality of the feature space by selecting the most valuable features for classification. Apart from reducing the dimensionality, feature selection methods have potential to improve text classifiers' performance both in terms of accuracy and time. Furthermore, it helps to build simpler and as a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
12
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 40 publications
(12 citation statements)
references
References 43 publications
0
12
0
Order By: Relevance
“…In data mining context, the Helmholtz principle states that essential features and interesting events are observed in large deviations from randomness [47]. In text mining research, the Helmholtz principle has been used for document processing and keyword extraction [12], automatic text summarization [48], and supervised and unsupervised feature selection [49]. The primary study [12] dealt with words to extract the meaningful units of a text document, but we deal with concepts instead.…”
Section: Fourth Approach: Selecting Meaningful Features By the Helmhomentioning
confidence: 99%
“…In data mining context, the Helmholtz principle states that essential features and interesting events are observed in large deviations from randomness [47]. In text mining research, the Helmholtz principle has been used for document processing and keyword extraction [12], automatic text summarization [48], and supervised and unsupervised feature selection [49]. The primary study [12] dealt with words to extract the meaningful units of a text document, but we deal with concepts instead.…”
Section: Fourth Approach: Selecting Meaningful Features By the Helmhomentioning
confidence: 99%
“…While, Dadaneh et al (2016) mentioned that feature selection is one of the most important fields in pattern recognition, which aims to pick a subset of relevant and informative features from an original feature set. Many other researchers study on feature selection approaches to handle high dimensionality problem (Ghareb et al, 2016;Hernández-Pereira et al, 2016;Tutkan et al, 2016;Vinh et al, 2016;Feng et al, 2015;Pinheiro et al, 2015;Chandrashekar and Sahin, 2014;Inbarani et al, 2015;Khalid et al, 2014;Rehman et al, 2015;Javed et al, 2012;Maldonado and Weber, 2009;Wei and Billings, 2007). Although many feature selection approaches have been proposed and have been employed in various domains, there are still some issues, especially in retrieving the relevant documents.…”
Section: Introductionmentioning
confidence: 99%
“…However, their results show the Odds Ratio and Information Gain outperformed Chi-square for 20 Newsgroups dataset. In other similar research, Tutkan et al (2016) proposed a new feature selection method named Meaning Based Feature Selection (MBFS). Then, they compared the performance of their proposed feature selection method with other methods such as Information Gain, Chi-squared, Odds ratio.…”
Section: Introductionmentioning
confidence: 99%
“…Today's large‐scale text data, our case study in this research, are also an important high‐dimensional application field where the space regularly contains at least several thousands of distinct terms and is very sparse and noisy. We focus on text classification, which is a useful subdiscipline of data mining and is an active research path nowadays . Given a set of text documents with known class labels, text classification intends to predict the class of new instances.…”
Section: Introductionmentioning
confidence: 99%