In the digitally connected world that we are living in, people expect to get answers to their questions spontaneously. This expectation increased the burden on Question/Answer platforms such as Stack Overflow and many others. A promising solution to this problem is to detect if a question being asked is similar to a question in the database, then present the answer of the detected question to the user. To address this challenge, we propose a novel Natural Language Processing (NLP) approach that detects if two Arabic questions are similar or not using their extracted morphological, syntactic, semantic, lexical, overlapping, and semantic lexical features. Our approach involves several phases including Arabic text processing, novel feature extraction, and text classifications. Moreover, we conducted a comparison between seven different machine learning classifiers. The included classifiers are: Support Vector Machine (SVM), Decision Tree (DT), Logistic Regression (LR), Extreme Gradient Boosting (XGB), Random Forests (RF), Adaptive Boosting (AdaBoost), and Multilayer Perceptron (MLP). To conduct our experiments, we used a real-world questions dataset consisting of around 19,136 questions (9,568 pairs of questions) in which our approach achieved 82.93% accuracy using our XGB model on the best features selected by the Random Forest feature selection technique. This high accuracy of our model shows the ability of our approach to correctly detect similar Arabic questions and hence increases user satisfactions.
In the digitally connected world that we are living in, people expect to get answers to their questions spontaneously. This expectation increased the burden on Question/Answer platforms such as Stack Overflow and many others. A promising solution to this problem is to detect if a question being asked is similar to a question in the database, then present the answer of the detected question to the user. To address this challenge, we propose a novel Natural Language Processing (NLP) approach that detects if two Arabic questions are similar or not using their extracted morphological, syntactic, semantic, lexical, overlapping, and semantic lexical features. Our approach involves several phases including Arabic text processing, novel feature extraction, and text classifications. Moreover, we conducted a comparison between seven different machine learning classifiers. The included classifiers are: Support Vector Machine (SVM), Decision Tree (DT), Logistic Regression (LR), Extreme Gradient Boosting (XGB), Random Forests (RF), Adaptive Boosting (AdaBoost), and Multilayer Perceptron (MLP). To conduct our experiments, we used a real-world questions dataset consisting of around 19,136 questions (9,568 pairs of questions) in which our approach achieved 82.93% accuracy using our XGB model on the best features selected by the Random Forest feature selection technique. This high accuracy of our model shows the ability of our approach to correctly detect similar Arabic questions and hence increases user satisfactions.
In this paper, we present our submission for SemEval-2020 competition subtask 1 in Task 7 (Hossain et al., 2020a): Assessing Humor in Edited News Headlines. The task consists of estimating the hilariousness of news headlines that have been modified manually by humans using micro-edit changes to make them funny. Our approach is constructed to improve on a couple of aspects; preprocessing with an emphasis on humor sense detection, using embeddings from state-of-the-art language model (ELMo), and ensembling the results came up with using machine learning model Naïve Bayes (NB) with a deep learning pretrained models. ELMo-NB participation has scored (0.5642) on the competition leader board, where results were measured by Root Mean Squared Error (RMSE).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.