Ismail Hmeidi scite author profile

Text categorization or classification (TC) is concerned with placing text documents in their proper category according to their contents. Owing to the various applications of TC and the large volume of text documents uploaded on the Internet daily, the need for such an automated method stems from the difficulty and tedium of performing such a process manually. The usefulness of TC is manifested in different fields and needs. For instance, the ability to automatically classify an article or an email into its right class (Arts, Economics, Politics, Sports, etc.) would be appreciated by individual users as well as companies. This paper is concerned with TC of Arabic articles. It contains a comparison of the five best known algorithms for TC. It also studies the effects of utilizing different Arabic stemmers (light and root-based stemmers) on the effectiveness of these classifiers. Furthermore, a comparison between different data mining software tools (Weka and RapidMiner) is presented. The results illustrate the good accuracy provided by the SVM classifier, especially when used with the light10 stemmer. This outcome can be used in future as a baseline to compare with other unexplored classifiers and Arabic stemmers.

show abstract

Design and implementation of automatic indexing for information retrieval with Arabic documents

Hmeidi

Kanaan

Evens

1997

J. Am. Soc. Inf. Sci.

View full text Add to dashboard Cite

We have put together a corpus of 242 abstracts of Arabic documents using the Proceedings of the Saudi Arabian National Conferences as a source. All these abstracts involve computer science and information systems. We also designed and built an automatic information retrieval system from scratch to handle Arabic data. The system was implemented in the C language using the GCC compiler and runs on IBM/PCs and compatible microcomputers. We have implemented both automatic and manual indexing techniques for this corpus. A long series of experiments using measures of recall and precision has demonstrated that automatic indexing is at least as effective as manual indexing and more effective in some cases. Since automatic indexing is both cheaper and faster, our results suggest that we can achieve a wider coverage of the literature with less money and produce as good results as with manual indexing. We have also compared the retrieval results using words as index terms versus stems and roots, and confirmed the results obtained by Al‐Kharashi and Abu‐Salem with smaller corpora that root indexing is more effective than word indexing. © 1997 John Wiley & Sons, Inc.

show abstract

On authorship authentication of Arabic articles

Alwajeeh

Al‐Ayyoub

Hmeidi

2014

View full text Add to dashboard Cite

A new enhancement to the R-tree node splitting

Al-Badarneh

Yaseen

Hmeidi

2009

Journal of Information Science

View full text Add to dashboard Cite

The performance of spatial queries depends mainly on the underlying index structure used to handle them. R-tree, a well-known spatial index structure, suffers largely from high overlap and high coverage resulting mainly from splitting the overflowed nodes. Assigning the remaining entries to the underflow node in order to meet the R-tree minimum fill constraint ( Remaining Entries problem) may induce high overlap or high coverage. This is done without considering the geometric features of the remaining entries and this may cause a very non-optimized expansion of that particular node. This paper presents a solution to the above problem. The proposed solution to this problem distributes rectangles as follows: (1) assign m entries to the first node, which are nearest to the first seed; (2) assign other m entries to the second node, which are nearest to the second seed; (3) assign the remaining entries one by one to the nearest seed. Several experiments on real data, as well as synthetic data, show that the proposed splitting algorithm outperforms the efficient version of the original R-tree in terms of query performance.

show abstract

Extracting the roots of Arabic words without removing affixes

Yaseen

Hmeidi

2014

Journal of Information Science

View full text Add to dashboard Cite

Most research in Arabic roots extraction focuses on removing affixes from Arabic words. This process adds processing overhead and may remove non-affix letters, which leads to the extraction of incorrect roots. This paper advises a new approach to dealing with this issue by introducing a new algorithm for extracting Arabic words’ roots. The proposed algorithm, which is called the Word Substring Stemming Algorithm, does not remove affixes during the extraction process. Rather, it is based on producing the set of all substrings of an Arabic word, and uses the Arabic roots file, the Arabic patterns file and a concrete set of rules to extract correct roots from substrings. The experiments have shown that the proposed approach is competitive and its accuracy is 83.9%, Furthermore, its accuracy can be enhanced more in the sense that, for about 9.9% of the tested words, the WSS algorithm retrieves two candidates (in most cases) for the correct root.

show abstract

A novel approach to the extraction of roots from Arabic words using bigrams

Hmeidi

Al-Shalabi

Al-Taani

et al. 2009

J. Am. Soc. Inf. Sci.

View full text Add to dashboard Cite

Root extraction is one of the most important topics in information retrieval (IR), natural language processing (NLP), text summarization, and many other important fields. In the last two decades, several algorithms have been proposed to extract Arabic roots. Most of these algorithms dealt with triliteral roots only, and some with fixed length words only. In this study, a novel approach to the extraction of roots from Arabic words using bigrams is proposed. Two similarity measures are used, the dissimilarity measure called the "Manhattan distance," and Dice's measure of similarity. The proposed algorithm is tested on the Holy Qu'ran and on a corpus of 242 abstracts from the Proceedings of the Saudi Arabian National Computer Conferences. The two files used contain a wide range of data: the Holy Qu'ran contains most of the ancient Arabic words while the other file contains some modern Arabic words and some words borrowed from foreign languages in addition to the original Arabic words. The results of this study showed that combining N -grams with the Dice measure gives better results than using the Manhattan distance measure.

show abstract

Automatic categorization of Arabic articles based on their political orientation

Abooraig

AlZu’bi

Kanan

et al. 2018

Digital Investigation

View full text Add to dashboard Cite

12 3 4 5 6

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ismail Hmeidi

Performance of KNN and SVM classifiers on full word Arabic articles

Automatic Arabic text categorization: A comprehensive comparative study

Design and implementation of automatic indexing for information retrieval with Arabic documents

On authorship authentication of Arabic articles

A new enhancement to the R-tree node splitting

Extracting the roots of Arabic words without removing affixes

A novel approach to the extraction of roots from Arabic words using bigrams

Automatic categorization of Arabic articles based on their political orientation

Contact Info

Product

Resources

About