<p>Semantic indexing and document similarity is an important information retrieval system problem in Big Data with broad applications. In this paper, we investigate MapReduce programming model as a specific framework for managing distributed processing in a large of amount documents. Then we study the state of the art of different approaches for computing the similarity of documents. Finally, we propose our approach of semantic similarity measures using WordNet as an external network semantic resource. For evaluation, we compare the proposed approach with other approaches previously presented by using our new MapReduce algorithm. Experimental results review that our proposed approach outperforms the state of the art ones on running time performance and increases the measurement of semantic similarity.</p>
Personalized Web Applications aim to improve the user's browsing experience by offering customized products and services based on his preferences and needs. A key feature of a successful personalization system is building profiles that accurately express the real interests and needs of each user. In this work, we focus on creating accurate, complete and dynamic profiles by capturing and tracking the users’ browsing activities. Moreover, we implement techniques to increase the accuracy of the retrieved user profiles by collecting more browsing data, identifying the most important concepts and removing irrelevant ones, and the number of levels from the concept hierarchy in the reference ontology that we should use to efficiently represent the users’ reel interests and needs. The result is a complete, dynamic, and accurate user profile that can be used to give users better-personalized browsing experience.
Natural Language Processing problems generally require the use of pre-trained distributed word representations to be solved with deep learning models. However, distributed representations usually rely on contextual information which prevents them from learning all the important word characteristics. The task of sentiment analysis suffers from such a problem because sentiment information is ignored during the process of learning word embeddings. The performance of sentiment analysis can be affected since two words with similar vectors may have opposite sentiment orientations. The present paper introduces a novel model called Continuous Sentiment Contextualized Vectors (CSCV) to address this problem. The proposed model can learn word sentiment embedding using its surrounding context words. It uses Continuous Bag-of-Words (CBOW) model to deal with the context and sentiment lexicons to identify sentiment. Existing pre-trained vectors are combined then with the obtained sentiment vectors using Principal component analysis (PCA) to enhance their quality. The experiments show that: (1) CSCV vectors can be used to enhance any pre-trained word vectors; (2) The result vectors strongly alleviate the problem of similar words with opposite polarities; (3) The performance of sentiment classification is improved by applying this approach.
Last December 2019, health officials in Wuhan, a province from China, identified a novel coronavirus called SARS-CoV-2 causing pneumonia. In March 2020, World Health Organization (WHO) declared COVID-19 disease being a pandemic. During quarantine periods, people all over the globe were living under severe and overwhelming circumstances and expressing feelings of loneliness, dread, and anxiety. The pandemic has had a significant impact on the labor markets. As a result, several employees have lost their jobs while others are in grave danger to lose their positions the next day. In this paper, we developed a hybrid approach integrating sentiment analysis combined with topic modeling to analyze the impact of the COVID-19 pandemic on Moroccan citizens. The data used in this study includes comments collected from a well-known news website in Morocco called Hespress. Our approach follows a two-step process. In the first step, we implement a topic modeling method to analyze and extract topics from Arabic comments, and in the second step, we perform topic-based sentiment analysis to classify people’s feedback on extracted topics. The final results revealed that the expressed sentiments regarding all the topics are highly negative.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.