Shimaa Ismail scite author profile

This work presents a new alignment word-space approach for measuring the similarity between two snipped texts. The approach combines two similarity measurement methods: alignment-based and vector space-based. The vector space-based method depends on a semantic net that represents the meaning of words as vectors. These vectors are lemmatized to enrich the search space. The alignment-based method generates an alignment word space matrix (AWSM) for the snipped texts according to the generated semantic word spaces. Finally, the degree of sentence semantic similarity is measured using some proposed alignment rules. Four experiments were carried out to evaluate the performance of the proposed approach, using two different datasets. The experimental results proved that applying the lemmatization process for the input text and the vector model has a better effect. The degree of correctness of the results reaches 0.7212 which is considered one of the best two results of the published Arabic semantic similarities.

show abstract

Arabic Semantic-Based Textual Similarity

Ismail

Alsammak

El-Shishtawy

2022

Benha Journal of Applied Sciences

View full text Add to dashboard Cite

Textual similarity is one of the most important aspects of information retrieval. This paper proposes several techniques of semantic textual similarity as well as the factors that influence them. Two-hybrid approaches for measuring the degree of similarity between two Arabic snipped texts are presented. The first proposed approach combined the word-based and vectorbased similarity methods to construct semantic word spaces for each word of the input text. These words are represented in their lemma forms to capture all semantically related words. In this approach, the semantic word spaces are used to find the best matching between the input text words, and hence, the degree of similarity between the two snipped texts is computed. The second proposed approach combined semantic and syntactic based approaches. The basic Levenshtein concept represents the main structure for this approach. It has been modified to measure the edit cost at the token level not at the character level. In addition, the semantic word spaces are added to this approach to include the semantic features to the syntactic features. Some techniques are embedded to overcome the syntactic approach problems such as the word sequence. Pearson correlation coefficient is used to measure the degree of correctness of the two proposed approaches as compared to two benchmark datasets. The experiments achieved 0.7212 and 0.7589 for the two proposed approaches on two different datasets.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shimaa Ismail

A Generic Approach for Extracting Aspects and Opinions of Arabic Reviews

A New Alignment Word-Space Approach for Measuring Semantic Similarity for Arabic Text

Arabic Semantic-Based Textual Similarity

Contact Info

Product

Resources

About