2011 International Symposium on Innovations in Intelligent Systems and Applications 2011
DOI: 10.1109/inista.2011.5946084
|View full text |Cite
|
Sign up to set email alerts
|

Analysis of preprocessing methods on classification of Turkish texts

Abstract: Çakırman, Erhan (Dogus Author) -- Ganiz, Murat C. (Dogus Author) -- Akyokuş, Selim (Dogus Author) -- Gürbüz, Mustafa Z. (Dogus Author) -- Conference full title: 2011 International Symposium on Innovations in Intelligent Systems and Applications (INISTA 2011) Istanbul, Turkey, 15 - 18 June 2011Preprocessing is an important task and critical step in information retrieval and text mining. The objective of this study is to analyze the effect of preprocessing methods in text classification on Turkish texts. We comp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
26
0
3

Year Published

2013
2013
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 49 publications
(30 citation statements)
references
References 13 publications
1
26
0
3
Order By: Relevance
“…Reuters-21578 [3]- [10] and 20Newsgroups [5], [6] datasets, consisting of English text content, are widely used to provide a general evaluation related to applied methods. Datasets which are composed of different sources and languages such as e-mail [4], SMS [4], news text [11], [12], technical paper [9], medical journals [13] and chemical web pages [10] are used to reveal the effect of classification methods on the other languages. Datasets containing Turkish documents are limited in number and they www.ijacsa.thesai.org are not regarded as standard datasets yet.…”
Section: A Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Reuters-21578 [3]- [10] and 20Newsgroups [5], [6] datasets, consisting of English text content, are widely used to provide a general evaluation related to applied methods. Datasets which are composed of different sources and languages such as e-mail [4], SMS [4], news text [11], [12], technical paper [9], medical journals [13] and chemical web pages [10] are used to reveal the effect of classification methods on the other languages. Datasets containing Turkish documents are limited in number and they www.ijacsa.thesai.org are not regarded as standard datasets yet.…”
Section: A Related Workmentioning
confidence: 99%
“…Datasets containing Turkish documents are limited in number and they www.ijacsa.thesai.org are not regarded as standard datasets yet. Some of them are as follows; 6-class 2 imbalanced datasets formed with news obtained from RSS source [11], and 5, 6 and 9-class 3 balanced datasets formed with columns and news [12]. Since there is not a standard dataset consisting of Turkish content, the evaluation of effects of the techniques on Turkish content cannot be done.…”
Section: A Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…In order to get good results, this step plays a very important role in our system. The impact of pre-processing in the field of text classification is extensively studied, and research on various languages like Arabic, Turkish, and Portuguese [14], [15], [16] support our motivation behind doing pre-processing at this step. It has already proven that preprocessing takes almost 80% of the total time in classification process [17].…”
Section: A Pre-processing Of Textmentioning
confidence: 99%