Hate speech and abusive language have become a common phenomenon on Arabic social media. Automatic hate speech and abusive detection systems can facilitate the prohibition of toxic textual contents. The complexity, informality and ambiguity of the Arabic dialects hindered the provision of the needed resources for Arabic abusive/hate speech detection research. In this paper, we introduce the first publicly-available Levantine Hate Speech and Abusive (L-HSAB) Twitter dataset with the objective to be a benchmark dataset for automatic detection of online Levantine toxic contents. We, further, provide a detailed review of the data collection steps and how we design the annotation guidelines such that a reliable dataset annotation is guaranteed. This has been later emphasized through the comprehensive evaluation of the annotations as the annotation agreement metrics of Cohen's Kappa (k) and Krippendorff's alpha (α) indicated the consistency of the annotations.
Analysis of Web server logs is one of the important challenge to provide Web intelligent services.In this paper, we describe a framework for a recommender system that predicts the user's next requests based on their behaviour discovered from Web Logs data. We compare results from three usage mining approaches: association rules, sequential rules and generalised sequential rules. We use two selection rules criteria: highest confidence and lastsubsequence. Experiments are performed on three collections of real usage data: one from an Intranet Web site and two from an Internet Web site.
Abstract-M-learning has enhanced the e-learning by making the learning process learner-centered. However, enforcing exam security in open environments where each student has his/her own mobile/tablet device connected to a Wi-Fi network through which it is further connected to the Internet can be one of the most challenging tasks. In such environments, students can easily exchange information over the network during exam time. This paper aims to identify various vulnerabilities that may violate exam security in mlearning environments and to design the appropriate security services and countermeasures that can be put in place to ensure exam security. It also aims to integrate the resulting secure exam system with an existing, open-source, and widely accepted Learning Management System (LMS) and its service extension to the m-learning environment, namely "the Moodbile Project".
For brands, gaining new customer is more expensive than keeping an existing one. Therefore, the ability to keep customers in a brand is becoming more challenging these days. Churn happens when a customer leaves a brand to another competitor. Most of the previous work considers the problem of churn prediction using CDRs. In this paper, we use micro-posts to classify customers into churny or nonchurny. We explore the power of CNNs since they achieved state-of-the-art in various computer vision and NLP applications. However, the robustness of end-toend models has some limitations such as the availability of a large amount of labeled data and uninterpretability of these models. We investigate the use of CNNs augmented with structured logic rules to overcome or reduce this issue. We developed our system called Churn teacher by using an iterative distillation method that transfers the knowledge, extracted using just the combination of three logic rules, directly into the weight of Deep Neural Networks (DNNs). Furthermore, we used weight normalization to speed up training our convolutional neural networks. Experimental results showed that with just these three rules, we were able to get state-of-the-art on publicly available Twitter dataset about three Telecom brands.
In this paper, we describe our contribution in SemEval-2018 contest. We tackled task 1 "Affect in Tweets", subtask E-c "Detecting Emotions (multi-label classification)". A multilabel classification system Tw-StAR was developed to recognize the emotions embedded in Arabic, English and Spanish tweets. To handle the multi-label classification problem via traditional classifiers, we employed the binary relevance transformation strategy while a TF-IDF scheme was used to generate the tweets' features. We investigated using single and combinations of several preprocessing tasks to further improve the performance. The results showed that specific combinations of preprocessing tasks could significantly improve the evaluation measures. This has been later emphasized by the official results as our system ranked 3 rd for both Arabic and Spanish datasets and 14 th for the English dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.