Extracting the roots of Arabic words without removing affixes

Yaseen, Qussai; Hmeidi, Ismail

doi:10.1177/0165551514526348

Cited by 16 publications

(13 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The stemming process involves the extraction of a word root to enhance the classifier accuracy by merging many word forms into one root form [37]. The Arabic language has a composite morphology structure that makes root extraction more complicated and limits the stemming to removing prefixes and suffixes [38].…”

Section: B Preprocessingmentioning

confidence: 99%

“…However, there are several algorithms can simplify extracting roots. These algorithms follow some rules for removing prefixes and suffixes to produce proper stemming, such as the AlKabi [39], Ghawanmeh [40], Hmeidi [41] , Khoja [42] and WSS-Based algorithms [37]. The Light10 stemmer [43], which is claimed to be the best available stemmer, works by solely removing the initial letter ‫,)و(‬ prefix ‫لل(‬ ‫فال,‬ ‫كال,‬ ‫بال,‬ ‫وال,‬ ‫ال,‬ ), and suffix ( ‫يه,‬ ‫ون,‬ ‫ات,‬ ‫ان,‬ ‫ها,‬ ‫ي‬ ‫ة,‬ ‫ه,‬ ‫ية,‬ ‫يه,‬ ), and this may not result in an accurate root extraction.…”

Section: B Preprocessingmentioning

confidence: 99%

See 1 more Smart Citation

ATAM: Arabic Traffic Analysis Model for Twitter

AlFarasani¹,

AlHarthi²,

Alhumoud³

2019

ijacsa

View full text Add to dashboard Cite

Harvesting Twitter for insight and meaning in what is called sentiment analysis (SA) is a major trend stemming from computational linguistics and AI. Industry and academia are interested in maximizing efficiency while mining text to attain the most currently available data and crowdsourcing opinions. In this study, we present the ATAM model for traffic analysis using the data available on Twitter. The model comprises five components that start with data streaming and collection and ends with the road incident prediction through classification. The classification of data is done using a lexiconbased method. The predicted classes are as follows: safe, needs attention, dangerous, and neutral. The data were collected for three months in the city of Riyadh, Saudi Arabia. The model was applied on 10k tweets with an overall accuracy of the model classifying all four classes of 82%.

show abstract

Section: B Preprocessingmentioning

confidence: 99%

Section: B Preprocessingmentioning

confidence: 99%

ATAM: Arabic Traffic Analysis Model for Twitter

AlFarasani¹,

AlHarthi²,

Alhumoud³

2019

ijacsa

View full text Add to dashboard Cite

show abstract

“…The approach adopted can be summarized in three stages. In the first stage, a Region CNN (RCNN) [29] is used to map image objects to Arabic root words by the aid of a transducer based algorithm for Arabic root extraction [30]. After that, stage two uses a word based RNN with LSTM memory cell to generate the most appropriate words for an image in Modern Standard Arabic (MSA).…”

Section: B Image Caption For Arabic Languagementioning

confidence: 99%

Automatic Arabic Image Captioning using RNN-LSTM-Based Language Model and CNN

Almuzaini¹,

Al-yahya²,

Benhidour³

2018

ijacsa

View full text Add to dashboard Cite

The automatic generation of correct syntaxial and semantical image captions is an essential problem in Artificial Intelligence. The existence of large image caption copra such as Flickr and MS COCO have contributed to the advance of image captioning in English. However, it is still behind for Arabic given the scarcity of image caption corpus for the Arabic language. In this work, an Arabic version that is a part of the Flickr and MS COCO caption dataset is built. Moreover, a generative merge model for Arabic image captioning based on a deep RNN-LSTM and CNN model is developed. The results of the experiments are promising and suggest that the merge model can achieve excellent results for Arabic image captioning if a larger corpus is used.

show abstract

“…To achieve this, at any given time when English labels of objects were Figure 4: Our Root-Word based Recurrent Neural Network used in training of the convolution neural network, Arabic root-words of the object were also given as input in the training phase. (Yaseen and Hmeidi, 2014;Yousef et al, 2014) proposed the well-known transducer based algorithm for Arabic root extraction which is used to extract root-words from an Arabic word in the training stage. Given the Arabic influence on root-words and the limited 4 verb prefixes, 12 noun prefixes and 20 common suffixes, the approach is optimized for initial training.…”

Section: Image Fragments To Root-words Using Dnnmentioning

confidence: 99%

Generating Image Captions in Arabic using Root-Word Based Recurrent Neural Networks and Deep Neural Networks

Jindal¹

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: St

View full text Add to dashboard Cite

Image caption generation has gathered widespread interest in the artificial intelligence community. Automatic generation of an image description requires both computer vision and natural language processing techniques. While, there has been advanced research in English caption generation, research on generating Arabic descriptions of an image is extremely limited. Semitic languages like Arabic are heavily influenced by root-words. We leverage this critical dependency of Arabic to generate captions of an image directly in Arabic using root-word based Recurrent Neural Network and Deep Neural Networks. Experimental results on datasets from various Middle Eastern newspaper websites allow us to report the first BLEU score for direct Arabic caption generation. We also compare the results of our approach with BLEU score captions generated in English and translated into Arabic. Experimental results confirm that generating image captions using root-words directly in Arabic significantly outperforms the English-Arabic translated captions using state-of-the-art methods.

show abstract

Extracting the roots of Arabic words without removing affixes

Cited by 16 publications

References 10 publications

ATAM: Arabic Traffic Analysis Model for Twitter

ATAM: Arabic Traffic Analysis Model for Twitter

Automatic Arabic Image Captioning using RNN-LSTM-Based Language Model and CNN

Generating Image Captions in Arabic using Root-Word Based Recurrent Neural Networks and Deep Neural Networks

Contact Info

Product

Resources

About