The measure of sentence similarity is useful in various research fields, such as artificial intelligence, knowledge management, and information retrieval. Several methods have been proposed to measure the sentence similarity based on syntactic and/or semantic knowledge. Most proposals are evaluated on English sentences where the accuracy can decrease when these proposals are applied to other languages. Moreover, the results of these methods are unsatisfactory, as much relevant semantic knowledge, such as semantic class, thematic role and syntactico-semantic knowledge like the semantic predicates, are not taken into account. We must acknowledge that this kind of knowledge is rare in most of the lexical resources. Recently, the International Organization for Standardization (ISO) has published the Lexical Markup Framework (LMF) ISO-24613 norm for the development of lexical resources. This norm provides, for each meaning of a lexical entry, all the semantic and syntactico-semantic knowledge in a fine structure. Profiting from the availability of LMF-standardized dictionaries, we propose, in this paper, a generic method that enhances the measure of sentence similarity by applying semantic and syntactico-semantic knowledge. An experiment was carried out on Arabic, as this language is processed within our research team and an LMF-standardized Arabic dictionary is at hand where the semantic and the syntactico-semantic B Wafa Wali
This article describes our proposed system named LIM-LIG. This system is designed for SemEval 2017 Task1: Semantic Textual Similarity (Track1). LIM-LIG proposes an innovative enhancement to word embedding-based model devoted to measure the semantic similarity in Ara-bic sentences. The main idea is to exploit the word representations as vectors in a multidimensional space to capture the semantic and syntactic properties of words. IDF weighting and Part-of-Speech tagging are applied on the examined sentences to support the identification of words that are highly descriptive in each sentence. LIM-LIG system achieves a Pearsons correlation of 0.74633, ranking 2nd among all participants in the Arabic monolingual pairs STS task organized within the Se-mEval 2017 evaluation campaign.
Sentence similarity computing is increasingly growing in several applications, such as question answering, machine-translation, information retrieval and automatic abstracting systems. This paper firstly sums up several methods to calculate similarity between sentences which consider semantic and syntactic knowledge. Second, it presents a new method for the sentence similarity measure that aggregates, in a linear function, three components: the Lexical similarity Lexsim including the common words, the semantic similarity SemSim using the synonymy words and the syntactico-semantic similarity SynSemSim based on common semantic arguments, notably, thematic role and semantic class. Concerning the word-based semantic similarity, a measure is computed to estimate the semantic degree between words by exploiting the WordNet "is a" taxonomy. Moreover, the semantic argument determination is based on the VerbNet database. The proposed method yielded competitive results compared to previously proposed measures and with regard to the Li's benchmark, which shown a high correlation with human ratings. Furthermore, experiments performed on the Microsoft Paraphrase Corpus showed the best F-measure values compared to other measures for high similarity thresholds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.