In this paper we put forward an unsupervised system WSD-AL for Arabic word disambiguation. We apply some pre-processing steps to texts containing the ambiguous word in the corpus and we extract the most relevant words. Then, we put to use the Context-Matching algorithm that returns a semantic coherence score corresponding to the context of use that is semantically closest to the original sentence. These Contexts are generated using the glosses of the ambiguous word and the corpus. The results found by the proposed system are satisfactory, as the rate of disambiguation obtained equals 78.
We propose a new approach for determining the adequate sense of Arabic words. For that, we propose an algorithm based on information retrieval measures to identify the context of use that is the closest to the sentence containing the word to be disambiguated. The contexts of use represent a set of sentences that indicates a particular sense of the ambiguous word. These contexts are generated using the words that define the senses of the ambiguous words, the exact string-matching algorithm, and the corpus. We use the measures employed in the domain of information retrieval, Harman, Croft, and Okapi combined to the Lesk algorithm, to assign the correct sense of those proposed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.