Design and implementation of automatic indexing for information retrieval with Arabic documents

Hmeidi, Ismail; Kanaan, Ghassan; Evens, Martha

doi:10.1002/(sici)1097-4571(199710)48:10<867::aid-asi3>3.0.co;2-#

Cited by 35 publications

(26 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Chen and Gey [13] proposed an approach to the cross language retrieval which was to translate the English topics into Arabic using online English-Arabic machine translation systems, and they reported on the construction of an Arabic stop list and two Arabic stemmers, and the experiments on Arabic monolingual retrieval, English-toArabic cross-language retrieval. Hmeidi , Kanaan and Evens [14] have put together a corpus and designed and built an automatic IR system from scratch to handle Arabic data. They have implemented both automatic and manual indexing techniques for this corpus.…”

Section: Related Workmentioning

confidence: 99%

Arabic information retrieval system using the neural network model

AlHadid¹,

Afaneh²,

Al-Tarawneh³

et al. 2014

International Journal of Advanced Research in Computer and Comm

View full text Add to dashboard Cite

Information Retrieval (IR) for Arabic language has gained significant attention and emerged as one of the research topics that has been studied by Arabic and foreign researchers. The goal of this research is to apply the IR using Neural Network (NN) Model on natural Arabic language text documents to solve the problem of retrieving the Arabic information from documents' database. Furthermore, all stored documents must be indexed with keywords classification that describe the exact content of each document, which makes it impossible to retrieve all related documents more computational time to classify and update the stored documents. IR using NN applies to solve the problem of documents indexing, classification and retrieving the related documents using Terms of weight and Normalization. The computational results have been compared with the Vector Space Model (VSM) and showed an improvement of NN training time compared with VSM load document time.

show abstract

Section: Related Workmentioning

confidence: 99%

Arabic information retrieval system using the neural network model

AlHadid¹,

Afaneh²,

Al-Tarawneh³

et al. 2014

International Journal of Advanced Research in Computer and Comm

View full text Add to dashboard Cite

show abstract

“…Some of the Arabic IR systems that use morphology include Swift [1] and electronic publishing software developed by Sakhr that contain IR components (such as the Encyclopedia of Jurisprudence) [2]. Arabic IR studies have shown that the use of Arabic roots as indexing terms substantially improves the retrieval effectiveness over the use of words as index terms [3] [4] [5].…”

Section: Introductionmentioning

confidence: 99%

“…However, this paper is concerned with morphological analysis for the purpose of IR. Arabic IR is enhanced when the roots are used in indexing and searching [3] [4] [5].…”

Section: Introductionmentioning

confidence: 99%

Building a shallow Arabic Morphological Analyzer in one day

Darwish

2002

Proceedings of the ACL-02 Workshop on Computational Approaches to Semitic Languages -

123

View full text Add to dashboard Cite

show abstract

“…Recall that ALPNET produces analysis in random order. As indicated earlier, some early work with small test collections (Al-Kharashi & Evens, 1994;Hmeidi et al, 1997) suggested that roots were a better choice than stems, but the experiments presented here found just the opposite. One possible explanation for this is that earlier test collections contained at most a few hundred documents, and scaling up the size of the collection by several orders of magnitude might reward the choice of less ambiguous terms.…”

Section: Evaluating Sebawai and Al-stem In Irmentioning

confidence: 40%

“…However, often irregular roots, which contain double or weak letters, lead to stems and words that have letters from the root that are deleted or replaced. For Arabic IR, several early studies suggested that indexing Arabic text using roots significantly increases retrieval effectiveness over the use of words or stems (Abu-Salem et al, 1999;Al-Kharashi & Evens, 1994;Hmeidi et al, 1997). However, the studies used small test collections of only hundreds of documents and the morphology in many of the studies was done manually.…”

Section: Introductionmentioning

confidence: 99%

Adapting Morphology for Arabic Information Retrieval*

Darwish

Oard

Text, Speech and Language Technology

View full text Add to dashboard Cite

Abstract:This chapter presents an adaptation of existing techniques in Arabic morphology by leveraging corpus statistics to make them suitable for Information Retrieval (IR). The adaptation resulted in the development of Sebawai, an shallow Arabic morphological analyzer, and Al-Stem, an Arabic light stemmer. Both were used to produce Arabic index terms for Arabic experimentation. Sebawai is concerned with generating possible roots and stems of a given Arabic word along with probability estimates of deriving the word from each of the possible roots. The probability estimates were used as a guide to determine which prefixes and suffixes should be used to build the light stemmer Al-Stem. The use of the Sebawai generated roots and stems as index terms along with the stems from Al-Stem are evaluated in an information retrieval application and the results are compared IntroductionDue to the morphological complexity of the Arabic language, Arabic morphology has become an integral part of many Arabic Information Retrieval (IR) and other natural language processing applications. Arabic words are divided into three types: noun, verb, and particle (Abdul-Al-Aal, 1987). Nouns and verbs are derived from a closed set of around 10,000 roots (Ibn Manzour, 2006). The roots are commonly three or four letters and are rarely five letters. Arabic nouns and verbs * All the experiments for this work were performed while the first author was at the University of Maryland, College Park.

show abstract

Design and implementation of automatic indexing for information retrieval with Arabic documents

Cited by 35 publications

References 6 publications

Arabic information retrieval system using the neural network model

Arabic information retrieval system using the neural network model

Building a shallow Arabic Morphological Analyzer in one day

Adapting Morphology for Arabic Information Retrieval*

Contact Info

Product

Resources

About