SEMAR: An Interface for Indonesian Hate Speech Detection Using Machine Learning

Rohmawati, Umu Amanah Nur; Sihwi, Sari Widya; Cahyani, Denis Eka

doi:10.1109/isriti.2018.8864484

Cited by 11 publications

(4 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For text classification using the machine-learning approach, researchers have used several models to classify whether a text contain hate speech and abusive language or not including Naive Bayes (NB) [5] , [44] , [1] , [20] , [40] , [4] , [24] , [25] , [38] , Support Vector Machine (SVM) [5] , [34] , [44] , [1] , [20] , [40] , [7] , [24] , [25] , [38] , [27] , Logistic Regression (LR) [5] , [39] , [44] , [40] , [7] , [27] , Decision Tree (DT) [44] , Random Forest Decision Tree (RFDT) [5] , [39] , [1] , [20] , [7] , [24] , [25] , [38] , [27] , k-Nearest Neighbor (kNN) [34] , [44] , Latent Semantic Analysis (LSA) [3] , Maximum Entropy [20] , [19] , and Artificial Neural Network (ANN) [49] . These machine-learning models are usually combined with several text features including word n-grams [5] , [39] , [1] , [40] , [7] , [49] , [4] , [24] , [25] , [38] , [27] , character n-grams [5] , [39] , [1] , [40] , ...…”

Section: Methodsmentioning

confidence: 99%

Hate speech and abusive language detection in Indonesian social media: Progress and challenges

Ibrohim¹,

Budi²

2023

Heliyon

View full text Add to dashboard Cite

Section: Methodsmentioning

confidence: 99%

Hate speech and abusive language detection in Indonesian social media: Progress and challenges

Ibrohim¹,

Budi²

2023

Heliyon

View full text Add to dashboard Cite

“…Before classifying the data, it is necessary to carry out several preprocessing procedures. Case folding involves changing words in a text into uniform lowercase letters to facilitate further processing [18,19]. Stop Word Removal, stop word is a common word that often appears in a sentence but has no meaning [18].…”

Section: Preprocessingmentioning

confidence: 99%

“…Case folding involves changing words in a text into uniform lowercase letters to facilitate further processing [18,19]. Stop Word Removal, stop word is a common word that often appears in a sentence but has no meaning [18]. Removing stop words can increase the signal-to-noise ratio in unstructured text and thus increase the statistical significance of terms that may be important for a specific task [20].…”

Section: Preprocessingmentioning

confidence: 99%

Hate Speech Detection for Banjarese Languages on Instagram Using Machine Learning Methods

Alkaff,

Miqdad,

Fachrurrazi

et al. 2023

matrik

View full text Add to dashboard Cite

Hate speech refers to verbal expression or communication that aims to provoke or discriminate against individuals. The Ministry of Communication and Information of Indonesia has encountered and dealt with 3,640 cases of hate speech transmitted through digital channels between 2018 and 2021. Particularly in South Kalimantan, hate speech in the local language, Banjarese has become increasingly prevalent in recent years. Surprisingly, there is a lack of research on using machine learning to detect hate speech in the Banjarese language, specifically on Instagram. Therefore, this study aimed to address this gap by constructing a dataset of Banjarese language hate speech and comparing various feature extraction and machine learning models to detect Banjarese language hate speech effectively. Thisresearch used several feature extraction techniques and machine learning methods to detect Banjareselanguage hate speech. The feature extraction methods used were Word N-Gram, Term Frequency- Inverse Document Frequency (TF-IDF), a combination of Word N-Gram and TF-IDF, Word2Vec, and Glove, while the machine learning methods used were Support Vector Machine (SVM), Na¨ıve Bayes, and Decision Tree. The results of this study revealed that the combination of TF-IDF for feature extraction and SVM as the model achieves exceptional performance. The average Recall, Precision, Accuracy, and F1-Score score exceeded 90%, demonstrating the model’s ability to identify Banjarese hate speech accurately.

show abstract

“…We carefully reviewed each document to obtain the key information of each work. In this part, we focus on [11], [30], [17], [23], [12], [28], [27], [21], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40]…”

Section: B What Has Been Done So Far In Indonesian Abusive Language D...mentioning

confidence: 99%

Hate Speech Detection in Bahasa Indonesia: Challenges and Opportunities

Pamungkas¹,

Putri²,

Fatmawati³

2023

IJACSA

View full text Add to dashboard Cite

This study aims to provide an overview of the current research on detecting abusive language in Indonesian social media. The study examines existing datasets, methods, and challenges and opportunities in this field. The research found that most existing datasets for detecting abusive language were collected from social media platforms such as Twitter, Facebook, and Instagram, with Twitter being the most commonly used source. The study also found that hate speech is the most researched type of abusive language. Various models, including traditional machine learning and deep learning approaches, have been implemented for this task, with deep learning models showing more competitive results. However, the use of transformer-based models is less popular in Indonesian hate speech studies. The study also emphasizes the importance of exploring more diverse phenomena, such as islamophobia and political hate speech. Additionally, the study suggests crowdsourcing as a potential solution for the annotation approach for labeling datasets. Furthermore, it encourages researchers to consider code-mixing issues in abusive language datasets in Indonesia, as it could improve the overall model performance for detecting abusive language in Indonesian data. The study also suggests that the lack of effective regulations and the anonymity afforded to users on most social networking sites, as well as the increasing number of Twitter users in Indonesia, have contributed to the rising prevalence of hate speech in Indonesian social media. The study also notes the importance of considering code-mixed language, out-of-vocabulary words, grammatical errors, and limited context when working with social media data.

show abstract

SEMAR: An Interface for Indonesian Hate Speech Detection Using Machine Learning

Cited by 11 publications

References 9 publications

Hate speech and abusive language detection in Indonesian social media: Progress and challenges

Hate speech and abusive language detection in Indonesian social media: Progress and challenges

Hate Speech Detection for Banjarese Languages on Instagram Using Machine Learning Methods

Hate Speech Detection in Bahasa Indonesia: Challenges and Opportunities

Contact Info

Product

Resources

About