Objectives: This study aims to develop a hate speech detection model for Afan Oromo's texts on social networks like Facebook and Twitter using a machine learning algorithm. Methods: we collected comments and posts from social media like Facebook and Twitter pages of BBC Afan Oromo, OBN Afan Oromo, Fana Afan Oromo Program, Politicians, Activists, Religious Men, and Oromia Communication Bureau using Face pager tool. The collected data was labelled using Afan Oromo hate speech evaluation system we developed. Text preprocessing tasks applied on data to remove special characters, stop-words, HTML Tags, extra whitespaces, numbers, lemmatization. The n-gram and TF-IDF was applied for feature extraction task to obtain benchmark Afan Oromo hate speech detection dataset. Researchers split dataset into train and test set. Finally, we applied Support Vector Classifier, Multinomial NB, Linear Support Vector Classifier, Logistic Regression decision tree and Random Forest Classifier on 67% of trained data. The performance of proposed model also evaluated using F-score. We also test the performance of developed model by loading test set into it. Findings: Hate speech on social media violates the welfare of Ethnic groups and citizens for living together. Many researches have been doing for English, Amharic, and other Languages to detect hate content from social media. This study has focused on developing a prototype for Afan Oromo hate speech detection model using machine learning algorithms and evaluate its performance in which we found Linear Support Vector Classifier scored highest f1-score value is 64%. Novelty: Afan Oromo hate speech detection framework proposed and successfully implemented to develop Afan Oromo hate speech detection model. We wrote python script that overcome problems typos in Afan Oromo in addition to designing python scripts that recognized apostrophe " ' " as important letter for Afan Oromo word formation. Yet, no researchers have used combination of n-gram and TF-IDF for feature extraction. In this study, the n-gram and TF-IDF used for feature extraction approach to build model https://www.indjst.org/ 2567
Since most of the existing major search engines and commercial Information Retrieval (IR) systems are primarily designed for well-resourced European and Asian languages, they have paid little attention to the development of Cross-Language Information Access (CLIA) technologies for resource-scarce African languages. This paper presents the authors' experience in building CLIA for indigenous African languages, with a special focus on the development and evaluation of Oromo-English-CLIR. The authors have adopted a knowledge-based query translation approach to design and implement their initial Oromo-English CLIR (OMEN-CLIR). Apart from designing and building the first OMEN-CLIR from scratch, another major contribution of this study is assessing the performance of the proposed retrieval system at one of the well-recognized international Cross-Language Evaluation Forums like the CLEF campaign. The overall performance of OMEN-CLIR was found to be very promising and encouraging, given the limited amount of linguistic resources available for severely under-resourced African languages like Afaan Oromo.
Since most of the existing major search engines and commercial Information Retrieval (IR) systems are primarily designed for well-resourced European and Asian languages, they have paid little attention to the development of Cross-Language Information Access (CLIA) technologies for resource-scarce African languages. This paper presents the authors' experience in building CLIA for indigenous African languages, with a special focus on the development and evaluation of Oromo-English-CLIR. The authors have adopted a knowledge-based query translation approach to design and implement their initial Oromo-English CLIR (OMEN-CLIR). Apart from designing and building the first OMEN-CLIR from scratch, another major contribution of this study is assessing the performance of the proposed retrieval system at one of the well-recognized international Cross-Language Evaluation Forums like the CLEF campaign. The overall performance of OMEN-CLIR was found to be very promising and encouraging, given the limited amount of linguistic resources available for severely under-resourced African languages like Afaan Oromo.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.