Empirical Studies On Machine Learning Based Text Classification Algorithms

Dharmadhikari, S. C.; Ingle, Maya; Kulkarni, Parag

doi:10.5121/acij.2011.2615

Cited by 31 publications

(13 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We used these five different classifiers since, in machine learning, there is no particular rule as to which classifier will perform best for a given feature set. These classifiers are few of the prominent machine learning classifiers (Dharmadhikari et al ., 2011) and were chosen based on their diverse nature of classification. The merit and demerit of each classifier are presented in Table 1.…”

Section: Methodsmentioning

confidence: 99%

Classification of childhood medulloblastoma into WHO‐defined multiple subtypes based on textural analysis

Das

Mahanta

Ahmed³

et al. 2020

Journal of Microscopy

View full text Add to dashboard Cite

Childhood medulloblastoma is a case of a childhood brain tumour that requires close attention due to the low survival rate. Effective prognosis depends a lot on accurate detection of its subtype. The present study proposes a texture-based computer-aided categorization of childhood medulloblastoma samples. According to the World Health Organization, it has four subtypes (desmoplastic, classic, nodular and large). Classification is done in two levels: (i) normal and abnormal and (ii) its four subtypes. The system is evaluated on indigenous patient samples collected from the region. The main objective of database generation is to create a data set of childhood medulloblastoma samples since there exists no available benchmark data set. The proposed framework for automated classification is based on the architectural property and the distribution of cells. Five texture features were extracted for the feature set, namely: grey-level co-occurrence matrix, grey-level run length matrix, first-order histogram features, local binary pattern and Tamura features. The performance of each feature set was evaluated, both individually and in combinations, using five different classifiers. Fivefold cross-validation was used for training and testing the data set. Experiments on both individual feature sets and combinations (best-2, best-3, best-4 and all-5) of feature sets were evaluated based on the accuracy of performance. It was revealed that the combined best-4 feature set resulted in the highest accuracy of 91.3%. The precision, recall and specificity were 0.913, 0.913 and 0.97, respectively. Significantly, it implied that the all-5 feature set is not necessary to have a useful classification. Feature reduction by principal component analysis resulted in increased accuracy of 96.7%.

show abstract

Section: Methodsmentioning

confidence: 99%

Classification of childhood medulloblastoma into WHO‐defined multiple subtypes based on textural analysis

Das

Mahanta

Ahmed³

et al. 2020

Journal of Microscopy

View full text Add to dashboard Cite

show abstract

“…The data was collected from several Arabian scientific encyclopedia in many fields. The accuracy was 91% and 93% for literary and scientific corpus, respectively [9].…”

Section: Related Workmentioning

confidence: 95%

“…There are two methods utilized in TC: machine learning in which the text can be classified by using a set of training documents, and rule-based TC which allows the usage of experts, or engineer's knowledge to classify the text [18]. Furthermore, the TC can be used in several applications of computer science such as spam or e-mail filtering, or as an accessible tool for interesting information in particular documents [4], [9].…”

Section: Text Classificationmentioning

confidence: 99%

A Survey of Arabic Text Classification Models

Al-Sbou¹

2018

IJECE

View full text Add to dashboard Cite

<p>There is a huge content of Arabic text available over online that requires an organization of these texts. As result, here are many applications of natural languages processing (NLP) that concerns with text organization. One of the is text classification (TC). TC helps to make dealing with unorganized text. However, it is easier to classify them into suitable class or labels. This paper is a survey of Arabic text classification. Also, it presents comparison among different methods in the classification of Arabic texts, where Arabic text is represented a complex text due to its vocabularies. Arabic language is one of the richest languages in the world, where it has many linguistic bases. The researche in Arabic language processing is very few compared to English. As a result, these problems represent challenges in the classification, and organization of specific Arabic text. Text classification (TC) helps to access the most documents, or information that has already classified into specific classes, or categories to one or more classes or categories. In addition, classification of documents facilitate search engine to decrease the amount of document to, and then to become easier to search and matching with queries.</p>

show abstract

“…Due to the surge in the size of data for the past two decades, automation process is required to achieve the goals of information extraction and classification/clustering of data for a variety of purposes. Those include email filtering and routing; news observing; Spam filtering and search engines [20]; newsgroups classification, and survey data grouping [17]. Depending on the nature of the available data, machine learning can be classified to three main categories [10] [21].…”

Section: B Machine Learning Techniquesmentioning

confidence: 99%

A Closer Look at Arabic Text Classification

Abdeen¹,

Albouq²,

Elmahalawy³

et al. 2019

IJACSA

View full text Add to dashboard Cite

The world has witnessed an information explosion in the past two decades. Electronic devices are now available in many varieties such as PCs, Laptops, book readers, mobile devices and with relatively affordable prices. This and the ubiquitous use of software applications such as social media and cloud applications, and the increasing trend towards digitalization, the amount of information on the global cloud has surged to an unprecedented level. Therefore, a dire need exists in order to mine this massively large amount of data and produce meaningful information. Text Classification is one of the known and well established data mining techniques that has been used and reported in the literature. Text classification methods include statistical and machine learning algorithms such as Naive Baysian, Support Vector Machines and others have widely been used. Many works have been reported regarding text classification of various languages including English, Chinese, Russian, and many others. Arabic is the fifth most spoken language in the world. There has been many works in the literature for Arabic text classification. However, and to the best of our knowledge, there is no recent work that presents a good, critical and comprehensive survey of the Arabic text classification for the past two decades. The aim of this paper is to present a concise and yet comprehensive review of the Arabic text classification. We have covered over 50 research papers covering the past two decades (2000 -2019). The main focus of this paper is to address the following issues: 1) The techniques reported in the literature including. 2) New Techniques. 3) Most claimed efficient technique. 4) Datasets used and which ones are most popular. 5) Which feature selection techniques are used? 6) Popular classes/categories used. 7) Effect of stemming techniques on classification results.

show abstract

Empirical Studies On Machine Learning Based Text Classification Algorithms

Cited by 31 publications

References 17 publications

Classification of childhood medulloblastoma into WHO‐defined multiple subtypes based on textural analysis

Classification of childhood medulloblastoma into WHO‐defined multiple subtypes based on textural analysis

A Survey of Arabic Text Classification Models

A Closer Look at Arabic Text Classification

Contact Info

Product

Resources

About