Souheyl Mallat scite author profile

Zouaghi

et al. 2013

In this paper, the authors propose a method for lexical enrichment of Arabic queries in order to improve the performance of the information retrieval systems SRI. This method has two types of enrichment: linguistic and contextual. The first one is based on the linguistic analysis (lemmatization, morphological, syntactic and semantic analysis), whose goal is to generate a descriptive list (list-desc). This list contains a set of linguistic lexicon assigned to each significant term in the query. The second enrichment consists in integrating contextual information derived from the corpus documents. It is based on statistical analysis using Salton weighting functions: TF-IDF and TF-IEF. The TF-IDF function is applied on the list-desc and documents in the corpus in order to identify relevant documents. TF-IEF function is made between the list-desc and sentences belonging to the relevant documents to identify relevant sentences. Then, terms in these sentences are weighted, and those with highest weights are considered rich in terms of informative and contextual importance are added to the original query. The authors' lexical enrichment method was evaluated on a corpus of documents belonging to a specialized domain and results show its interest in terms of precision and recall.

show abstract

Events Automatic Extraction from Arabic Texts

2016

The event extraction task consists in determining and classifying events within an open-domain text. It is very new for the Arabic language, whereas it attained its maturity for some languages such as English and French. Events extraction was also proved to help Natural Language Processing tasks such as Information Retrieval and Question Answering, text mining, machine translation etc… to obtain a higher performance. In this article, we present an ongoing effort to build a system for event extraction from Arabic texts using Gate platform and other tools.

show abstract

Proposal of statistical method of semantic indexing for multilingual documents

2016

Semantic Network Formalism for Knowledge Representation

2015

In this paper, the authors propose formalism for representing a knowledge base (KB) by network. The objective is to achieve a high coverage of this base. This type of network is similar to the semantic network with the difference that the arcs are quantified by a value indicating the semantic proximity between the concepts. This semantic proximity presents taxonomic relations, synonyms, and non-taxonomic relations (contextual relations). This latter are discovered based on the association rules model. This model is based on (i) indexing method (ii) the French lexical database EuroWordNet (EWNF) and (iii) the Apriori algorithm. The contextual relations are the latent relations buried in the KB, carried by the semantic context. Evaluating our representation formalism shows better result about 80% of coverage of the KB.

show abstract

Integrating Bilingual Named Entities Lexicon with Conditional Random Fields Model for Arabic Named Entities Recognition

2017