2015
DOI: 10.1007/978-3-319-24069-5_15
|View full text |Cite
|
Sign up to set email alerts
|

Supervised Learning to Measure the Semantic Similarity Between Arabic Sentences

Abstract: This article describes our proposed system named LIM-LIG. This system is designed for SemEval 2017 Task1: Semantic Textual Similarity (Track1). LIM-LIG proposes an innovative enhancement to word embedding-based model devoted to measure the semantic similarity in Ara-bic sentences. The main idea is to exploit the word representations as vectors in a multidimensional space to capture the semantic and syntactic properties of words. IDF weighting and Part-of-Speech tagging are applied on the examined sentences to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 5 publications
0
5
0
Order By: Relevance
“…Another shortcoming of previous studies focusing on the Arabic language is that the semantic similarity analyses employed have not utilized enough resources (e.g., tools and benchmark data) due to a lack of availability. One such research work was conducted by [15], who determined semantic similarity at the sentence level using supervised learning. Specifically, their method analyzed semantic, lexical, and syntactic-semantic features, which were extracted using an Arabic dictionary, a lexical markup framework, and a learning corpus.…”
Section: Related Workmentioning
confidence: 99%
“…Another shortcoming of previous studies focusing on the Arabic language is that the semantic similarity analyses employed have not utilized enough resources (e.g., tools and benchmark data) due to a lack of availability. One such research work was conducted by [15], who determined semantic similarity at the sentence level using supervised learning. Specifically, their method analyzed semantic, lexical, and syntactic-semantic features, which were extracted using an Arabic dictionary, a lexical markup framework, and a learning corpus.…”
Section: Related Workmentioning
confidence: 99%
“…Based on what Wali et al [46] discussed, we noted that most of the previous researches mentioned above estimated the semantic similarity based only on the word order or the syntactic dependency and the synonymy relationship between terms in sentences without taking into consideration the semantic arguments namely the semantic class and thematic role in computing the semantic similarity. Wali et al [46] presented a hybrid method for measuring semantic similarity between sentences depending on supervised learning and three linguistics features (Lexical, Semantic and Syntactic-Semantic) extracted from learning corpus and Arabic dictionaries like LMF dictionary. This is a two-phase method: the learning phase, which consists of two processes: the pre-processing process that aimed to have an annotated corpus and the training process that is used to catch a hyperplane equation via the learning algorithm.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Following the trend of combining detection methods, we see the analysis of non-textual content features as a promising component of future integrated detection approaches. Surprisingly many papers in our collection addressed plagiarism detection for Arabic and Persian texts (e.g., References [22,118,231,262]). The interest in plagiarism detection for the Arabic language led the organizers of the PAN competitions to develop an Arabic corpus for intrinsic plagiarism detection [34].…”
Section: Extrinsic Plagiarism Detectionmentioning
confidence: 99%
“…In 2015, the PAN organizers also introduced a shared task on plagiarism detection for Arabic texts [32], followed by a shared task for Persian texts one year later [22]. While these are promising steps toward improving plagiarism detection for Arabic, Wali et al [262] noted that the availability of corpora and lexicons for Arabic is still insufficient when compared to other languages. This lack of resources and the complex linguistic features of the Arabic language cause plagiarism detection for Arabic to remain a significant research challenge [262].…”
Section: Extrinsic Plagiarism Detectionmentioning
confidence: 99%