The aim of this research is to detect and classify websites based on their content if it encourages spreading hate speech toward Islam and Muslims, or Islamophobia using sentiment analysis and web text mining techniques. In this research, a large dataset corpus has been collected, to identify and classify anti-Islamic online contents. Our target is to automatically detect the content of those websites that are hostile to Islam and transmitting extremist ideas against it. The main purpose is to reduce the spread of those webpages that give the wrong idea about Islam. The proper dataset is collected from different sources, and the two datasets for the Arabic language (balanced and unbalanced) have been produced. The framework of the proposed approach has been described. The approach used in this framework is based on supervised Machine Learning (ML) approach using Support Vector Machines (SVM) and Multinomial Naive Bayes (MNB) models as classifiers, and Term Frequency-Inverse Document Frequency (TF-IDF) as feature extraction. Different experiments including word level and trigram level on the two datasets have been conducted, and compared the obtained results. The experimental results shows that the supervised ML approach using word level is the finest approach for both datasets that produce high accuracy with 97% applied on the balanced Arabic dataset using SVM algorithm with TF-IDF as feature extraction. Finally, an interactive webapplication prototype has been developed and built in order to detect and classify toxic language such as anti-Islamic online textcontents.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.