Stemming versus Light Stemming for measuring the simitilarity between Arabic Words with Latent Semantic Analysis model

Froud, Hanane; Lachkar, Abdelmonaime; Ouatik, Said Alaoui

doi:10.1109/cist.2012.6388065

Cited by 15 publications

(8 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…So far, the research studies presented by Froud et al (2010) and Froud, Lachkar & Ouatik (2012b) are the only works that have investigated the effect of using stemming on semantic similarity of Arabic text. Froud et al (2010) investigated diverse similarity measures with document clustering and they applied stemming to words which have reduced documents representation and provided fast clustering.…”

Section: Related Workmentioning

confidence: 99%

“… Froud et al (2010) investigated diverse similarity measures with document clustering and they applied stemming to words which have reduced documents representation and provided fast clustering. Froud, Lachkar & Ouatik (2012b) tested the effect of using stemming and light stemming on the semantic similarity between Arabic words. The similarity is measured by Latent Semantic Analysis (LSA) and computed by using different measures.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Effect of stemming on text similarity for Arabic language at sentence level

Alhawarat

Abdeljaber

Hilal

2021

PeerJ Computer Science

View full text Add to dashboard Cite

Semantic Text Similarity (STS) has several and important applications in the field of Natural Language Processing (NLP). The Aim of this study is to investigate the effect of stemming on text similarity for Arabic language at sentence level. Several Arabic light and heavy stemmers as well as lemmatization algorithms are used in this study, with a total of 10 algorithms. Standard training and testing data sets are used from SemEval-2017 international workshop for Task 1, Track 1 Arabic (ar–ar). Different features are selected to study the effect of stemming on text similarity based on different similarity measures. Traditional machine learning algorithms are used such as Support Vector Machines (SVM), Stochastic Gradient Descent (SGD) and Naïve Bayesian (NB). Compared to the original text, using the stemmed and lemmatized documents in experiments achieve enhanced Pearson correlation results. The best results attained when using Arabic light Stemmer (ARLSTem) and Farasa light stemmers, Farasa and Qalsadi Lemmatizers and Tashaphyne heavy stemmer. The best enhancement was about 7.34% in Pearson correlation. In general, stemming considerably improves the performance of sentence text similarly for Arabic language. However, some stemmers make results worse than those for original text; they are Khoja heavy stemmer and AlKhalil light stemmer.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Effect of stemming on text similarity for Arabic language at sentence level

Alhawarat

Abdeljaber

Hilal

2021

PeerJ Computer Science

View full text Add to dashboard Cite

show abstract

“…We choose to discuss [3] algorithm since it's one of the most successful approaches to Arabic Stemming [6], the Light10 algorithm which is the modification version of Light8 and developed by [3], has outperformed most of the morphological analyzers and tries to improve the information retrieval performance [13,10,12,17]. [3] constructed the stemmers "suffix 1 and prefix 2 " based on heuristics, light10 Stemming Algorithm Steps are:  Remove ‫"و"‬ for V1, light 2, light 3, light 8, and light10 if the remainder of the word is 3 or more characters long.…”

Section: Stemmer Based On Suffix and Prefixmentioning

confidence: 99%

Arabic stemming techniques: Comparisons and new vision

Al-Zyoud

Al-Rabayah

2015

2015 IEEE 8th GCC Conference &Amp; Exhibition

View full text Add to dashboard Cite

Arabic information extraction processes have become a popular area of research. Many methods and approaches have designed and introduced algorithms to solve the problem of morphology and stemming of Arabic language. Each researcher proposed his own standards, testing methodology and accuracy measurements to test his algorithm. Therefore, we cannot make an exact comparison between these algorithms. However, this research goes over stemming processes by explaining and discussing Arabic language characteristics and difficulties of stemming it, comparing root-based stemming, suffix and prefix -based stemming, and translation base stemming against each other, representing a modified stemming algorithm which helps go over some missing words in other algorithms. And finally, representing a new vision of Arabic stemming techniques.

show abstract

“…Classification [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33] , [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47], [18], [48], [49], [50] 31(29%) Stemming and Lemmatization [51], [52], [53], [54], [55], [4], [56], [57], [58], [59], [60], [61] 12(11%) Information Retrieval [9], [62], [63], [64], [65], [66], [67], …”

Section: Techniquementioning

confidence: 99%

Arabic Text Mining a Systematic Review of the Published Literature 2002-2014

AlMahmoud

Al-Razgan

2015

2015 International Conference on Cloud Computing (ICCC)

View full text Add to dashboard Cite

Text Mining is a set of techniques that analyzes large masses of data, extract relations that are unknown beforehand, and provide solutions to help decision-making. Text mining had been used extensively to analyze English text. However, text mining has only been used recently in analyzing Arabic text. As a result the objective of this paper is to present the current state of Arabic text mining. A systematic review has been performed to collect the papers published on the analysis of Arabic text mining. More than one hundred papers were used in our review from different reliable sources, and then they were classified according to their specific domain, and classified again according to the specific techniques used. This paper also provides quantitative analysis of publications according to publication type, year, category, and contributors.

show abstract

Stemming versus Light Stemming for measuring the simitilarity between Arabic Words with Latent Semantic Analysis model

Cited by 15 publications

References 5 publications

Effect of stemming on text similarity for Arabic language at sentence level

Effect of stemming on text similarity for Arabic language at sentence level

Arabic stemming techniques: Comparisons and new vision

Arabic Text Mining a Systematic Review of the Published Literature 2002-2014

Contact Info

Product

Resources

About