Çakırman, Erhan (Dogus Author) -- Ganiz, Murat C. (Dogus Author) -- Akyokuş, Selim (Dogus Author) -- Gürbüz, Mustafa Z. (Dogus Author) -- Conference full title: 2011 International Symposium on Innovations in Intelligent Systems and Applications (INISTA 2011) Istanbul, Turkey, 15 - 18 June 2011Preprocessing is an important task and critical step in information retrieval and text mining. The objective of this study is to analyze the effect of preprocessing methods in text classification on Turkish texts. We compiled two large datasets from Turkish newspapers using a crawler. On these compiled data sets and using two additional datasets, we perform a detailed analysis of preprocessing methods such as stemming, stopword filtering and word weighting for Turkish text classification on several different Turkish datasets. We report the results of extensive experiments.TUBITAK, IEE
Text normalization is an indispensable stage for natural language processing of social media data with available NLP tools. We divide the normalization problem into 7 categories, namely; letter case transformation, replacement rules & lexicon lookup, proper noun detection, deasciification, vowel restoration, accent normalization and spelling correction. We propose a cascaded approach where each ill formed word passes from these 7 modules and is investigated for possible transformations. This paper presents the first results for the normalization of Turkish and tries to shed light on the different challenges in this area. We report a 40 percentage points improvement over a lexicon lookup baseline and nearly 50 percentage points over available spelling correctors.
Smoothing is used; however in this paper we propose Wikipedia based semantic smoothing approach. Our semantic smoothing formulation is based on the work in (Zhou, 2008
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.