The Effect of Preprocessing on Arabic Document Categorization

Ayedh, Abdullah Mohammed; Tan, Guozhen; Alwesabi, Khaled; Rajeh, Hamdi

doi:10.3390/a9020027

Cited by 60 publications

(36 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, Hmeidi et al [12] studied the influence of raw text, khoja root-based stemmer and light stemming of Arabic text documents based on standard classifiers, such as NB, SVM, KNN, J48 and Decision Table classifiers. The results exhibited that the SVM and NB classifiers with light stemming provides better classification accuracy than other classifiers.The same conclusion was drawn up by Al-Badarneh [13] and Ayedh et al [14] by using various pre-processing methods. Additionally, Al-Molegi et al [15] and Khreisat [16] have proposed an approach to classify Arabic text documents based on the combination of N-grams with some similarity measures, including Manhattan, Euclidean distances and Dice.…”

Section: Related Worksupporting

confidence: 78%

Untitled

2017

IJAIA

View full text Add to dashboard Cite

show abstract

Section: Related Worksupporting

confidence: 78%

Untitled

2017

IJAIA

View full text Add to dashboard Cite

show abstract

“…Different samples of such insignificant words are pronouns, articles, conjunctions ( ‫ه‬ ، ‫ه‬ ، ‫,)ه‬ prepositions ‫ل،(‬ ، ، ، ، ‫ا‬ ، ), demonstratives, ( ‫او‬ ‫ء،‬ ‫ا،ه‬ ‫)ه‬ and interrogatives ( ، ‫. )ا‬ Besides, Arabic-specific nouns stating place and time ‫ق،(‬ ، ) and symbols (@, #, &, %, *) are considered insignificant and can be removed (Ayedh et al, 2016). (2012) was used and updated by preventing the removal of certain stop-words in documents.…”

Section: Stop-word Removalmentioning

confidence: 99%

“…2. Finally, the character that takes the symbol "ّ " can be replaced by two duplicate characters of the same character, as these characters are used to extract the Arabic roots in order to eliminate them for preventing them from affecting the meaning of the words (Ayedh et al, 2016).…”

Section: Normalizationmentioning

confidence: 99%

See 1 more Smart Citation

Arabic Poetry Authorship Attribution using Machine Learning Techniques

Ahmed¹,

Ramdani²,

Bellafkih³

2019

Journal of Computer Science

View full text Add to dashboard Cite

In this study, authorship attribution in Arabic poetry will be conducted to determine the authorship of a specified text after documents with recognized authorships have been allocated. This work also measures the impact performance of Naïve Bayes, Support Vector Machine and Linear discriminant analysis for Arabic poetry authorship attribution using text mining classification. Several features such as lexical features, character features, structural features, poetry features, syntactic features, semantic features and specific word features are utilized as the input data for text mining, using classification algorithms Linear discriminant analysis, Support Vector Machine and Naïve Bayes by Arabic Poetry Authorship Attribution Model (APAAM). The dataset of Arabic poetry is divided into two sets: known poetic in training dataset texts and anonymous poetic texts in a test dataset part. In the experiment, a set of 114 random poets from entirely different eras are used. The highest performance accuracy value is 99, 12%; the performance rate at the attribute level is 98.246%; the level of techniques is 92.836%.

show abstract

“…Removing all the stop words, symbols and stemming to the user queries before tokenization process take in place [14]. Tokenization will produces queries catchphrases in view of significance idea words.…”

Section: Tokenizationmentioning

confidence: 99%

Evaluation on knowledge extraction and machine learning in resolving Malay word ambiguity

Yahaya

Rahman

Bakar

et al. 2018

J. Fundam and Appl Sci.

View full text Add to dashboard Cite

The involvement of linguistic professionals in resolving the ambiguity of a word within a particular context will produce a concise meaning of the words that are found in the lexical knowledge based collection. Motivated from that issue, we employed lexical knowledge and machine learning approach which includes the integration of data or/and information from the lexical knowledge based, that is Malay collections which linked to the ambiguous words. We show that the proposed method has improved the precision in resolving ambiguity.

show abstract

The Effect of Preprocessing on Arabic Document Categorization

Cited by 60 publications

References 29 publications

Untitled

Untitled

Arabic Poetry Authorship Attribution using Machine Learning Techniques

Evaluation on knowledge extraction and machine learning in resolving Malay word ambiguity

Contact Info

Product

Resources

About