Nouha Othman scite author profile

et al. 2019

Over the last years, with the explosive growth of social media, huge amounts of rumors have been rapidly spread on the internet. Indeed, the proliferation of malicious misinformation and nasty rumors in social media can have harmful effects on individuals and society. In this paper, we investigate the content of the fake news in the Arabic world through the information posted on YouTube. Our contribution is threefold. First, we introduce a novel Arab corpus for the task of fake news analysis, covering the topics most concerned by rumors. We describe the corpus and the data collection process in detail. Second, we present several exploratory analysis on the harvested data in order to retrieve some useful knowledge about the transmission of rumors for the studied topics. Third, we test the possibility of discrimination between rumor and no rumor comments using three machine learning classifiers namely, Support Vector Machine (SVM), Decision Tree (DT) and Multinomial Naïve Bayes (MNB).

Enhancing Question Retrieval in Community Question Answering Using Word Embeddings

Procedia Computer Science

2019

Manhattan Siamese LSTM for Question Retrieval in Community Question Answering

Othman¹,

2019

Community Question Answering (cQA) are platforms where users can post their questions, expecting for other users to provide them with answers. We focus on the task of question retrieval in cQA which aims to retrieve previous questions that are similar to new queries. The past answers related to the similar questions can be therefore used to respond to the new queries. The major challenges in this task are the shortness of the questions and the word mismatch problem as users can formulate the same query using different wording. Although question retrieval has been widely studied over the years, it has received less attention in Arabic and still requires a non trivial endeavour. In this paper, we focus on this task both in Arabic and English. We propose to use word embeddings, which can capture semantic and syntactic information from contexts, to vectorize the questions. In order to get longer sequences, questions are expanded with words having close word vectors. The embedding vectors are fed into the Siamese LSTM model to consider the global context of questions. The similarity between the questions is measured using the Manhattan distance. Experiments on real world Yahoo! Answers dataset show the efficiency of the method in Arabic and English.

Learning English and Arabic question similarity with Siamese Neural Networks in community question answering services

Data & Knowledge Engineering

2022

In this paper, we tackle the task of similar question retrieval (QR) which is essential for Commu-nity Question Answering (cQA) and aims to retrieve historical questions that are semantically equivalent to the new queries. Over time, with the sharp increase of community archives and the accumulation of duplicated questions, the QR problem has become increasingly challenging due to the shortness of the community questions as well as the word mismatch problem as users can formulate the same query using different wording. Although many efforts have been devoted to address this problem, existing methods mostly relied on supervised models which significantly depend on massive training data sets and manual feature engineering. Such methods are chiefly constrained by their specificities that ignore the word order and do not capture enough syntactic and semantic information in questions. In this paper, we rely on Neural Networks (NNs) which use a deep analysis of words and questions to take into consideration the semantics as well as the structure of questions to predict the semantic text similarity. We propose a deep learning approach based on a Siamese architecture with Long Short-Term Memory (LSTM) networks, augmented with an attention mechanism to let the model give different words different attention while modeling questions. We also explore the use of Convolutional Neural Networks (CNN) nested within the Siamese architecture to retrieve relevant questions. Different similarity measures were tested to predict the semantic similarity between the the pairs of questions. To evaluate the proposed approach, we conducted experiments on large-scale datasets in English and Arabic.

Improving the Community Question Retrieval Performance Using Attention-Based Siamese LSTM

2020

In this paper, we focus on the problem of question retrieval in community Question Answering (cQA) which aims to retrieve from the community archives the previous questions that are semantically equivalent to the new queries. The major challenges in this crucial task are the shortness of the questions as well as the word mismatch problem as users can formulate the same query using different wording. While numerous attempts have been made to address this problem, most existing methods relied on supervised models which significantly depend on large training data sets and manual feature engineering. Such methods are mostly constrained by their specificities that put aside the word order and ignore syntactic and semantic relationships. In this work, we rely on Neural Networks (NNs) which can learn rich dense representations of text data and enable the prediction of the textual similarity between the community questions. We propose a deep learning approach based on a Siamese architecture with LSTM networks, augmented with an attention mechanism. We test different similarity measures to predict the semantic similarity between the community questions. Experiments conducted on real cQA data sets in English and Arabic show that the performance of question retrieval is improved as compared to other competitive methods.

Question Answering Passage Retrieval and Re-ranking Using N-grams and SVM

Faiz²

2016

CyS

Over the last few decades, with the meteoric rise of Information Technology, Question Answering (QA) has attracted more attention and has been extremely explored. Indeed, several QA systems are based on a passage retrieval engine which aims to deliver a set of passages that are most likely to contain a relevant response to a question stated in natural language. In an attempt to enhance the performance of existing QASs by increasing the number of generated correct answers and ensure their relevance, we propose a novel approach for retrieving and re-ranking passages based on n-grams and SVM models. The core principle is to first rely on the dependency degree of n-gram words of the query in the passage to retrieve correct passages. Then, an SVM based model is used to improve passage ranking incorporating various lexical, syntactic and semantic similarity measures. Emperical evaluation performed with the CLEF dataset demonstrates the merits of our approach: the results obtained by our implemented system transcend that of other previously proposed ones.

Retrieving Relevant Passages Using N-grams for Open-Domain Question Answering

Int. J. Artif. Intell. Tools

2019

Question Answering is most likely one of the toughest tasks in the field of Natural Language Processing. It aims at directly returning accurate and short answers to questions asked by users in human language over a huge collection of documents or database. Recently, the continuously exponential rise of digital information has imposed the need for more direct access to relevant answers. Thus, question answering has been the subject of a widespread attention and has been extensively explored over the last few years. Retrieving passages remains a crucial but also a challenging task in question answering. Although there has been an abundance of work on this task, this latter still implies non-trivial endeavor. In this paper, we propose an ad-hoc passage retrieval approach for Question Answering using n-grams. This approach relies on a new measure of similarity between a passage and a question for the extraction and ranking of the different passages based on n-gram overlapping. More concretely, our measure is based on the dependency degree of n-gram words of the question in the passage. We validate our approach by the development of the “SysPex” system that automatically returns the most relevant passages to a given question.

A Multi-lingual Approach to Improve Passage Retrieval for Automatic Question Answering

Faiz²

2016