Hate speech and abusive language spreading on social media need to be detected automatically to avoid conflicts between citizens. Moreover, hate speech has a target, category, and level that also need to be detected to help the authority in prioritizing which hate speech must be addressed immediately. This research discusses multi-label text classification for abusive language and hate speech detection including detecting the target, category, and level of hate speech in Indonesian Twitter using machine learning approaches with Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) classifier and Binary Relevance (BR), Label Power-set (LP), and Classifier Chains (CC) as the data transformation method. We used several kinds of feature extractions which are term frequency, orthography, and lexicon features. Our experiment results show that in general the RFDT classifier using LP as the transformation method gives the best accuracy with fast computational time.
Nowadays social media is often misused to spread hate speech. Spreading hate speech is an act that needs to be handled in a special way because it can undermine or discriminate other people and cause conflict that leading to both material and immaterial losses. There are several challenges in building a hate speech identification system; one of them is identifying hate speech in multilingual scope. In this paper, we adapt and compare two methods in multilingual text classification which are translated (with and without language identification) and non-translated method for multilingual hate speech identification (including Hindi, English, and Indonesian language) using machine learning approach. We use some classification algorithms (classifiers) namely Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) with word n-grams and char n-grams (character n-grams) as feature extraction. Our experiment result shows that the non-translated method gives the best result. However, the use of non-translated method needs to be reconsidered because this method needs more cost for data collection and annotation. Meanwhile, translated without language identification method give a poor result. To address this problem, we combine translated method with monolingual hate speech identification, and the experiment result shows that this approach can increase the multilingual hate speech identification performance compared to translate without language identification. This paper discusses the advantages and disadvantages for all method and the future works to enhance the performance in multilingual hate speech identification.
Abduction (also called abductive reasoning) is a form of logical inference which starts with an observation and is followed by finding the best explanations. In this paper, we improve the tabling in contextual abduction technique with an advanced tabling feature of XSB Prolog, namely tabling with interned terms. This feature enables us to store the abductive solutions as interned ground terms in a global area only once so that the use of table space to store abductive solutions becomes more efficient. We implemented this improvement to a prototype, called as TABDUAL+INT. Although the experiment result shows that tabling with interned terms is relatively slower than tabling without interned terms when used to return first solutions from a subgoal, tabling with interned terms is relatively faster than tabling without interned terms when used to returns all solutions from a subgoal. Furthermore, tabling with interned terms is more efficient in table space used when performing abduction both in artificial and real world case, compared to tabling without interned terms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.