Background: Hate speech is an expression to someone or a group of people that contain feelings of hate and/or anger at people or groups. On social media users are free to express themselves by writing harsh words and share them with a group of people so that it triggers separations and conflicts between groups. Currently, research has been conducted by several experts to detect hate speech in social media namely machine learning-based and lexicon-based, but the machine learning approach has a weakness namely the manual labelling process by an annotator in separating positive, negative or neutral opinions takes time long and tiringObjective: This study aims to produce a dictionary containing abusive words from local languages in Indonesia. Lexicon-base is very dependent on the language contained in dictionary words. Indonesia has thousands of tribes with 2500 local languages, and 80% of the population of Indonesia use local languages in communication, with the result that a significant challenge to detect hate speech of social media.Methods: Abusive words surveys are conducted by using proportionate stratified random sampling techniques in 4 major tribes on the island of Java, namely Betawi, Sundanese, Javanese, MadureseResults: The experimental results produce 250 abusive words dictionary from 4 major Indonesian tribes to detect hate speech in Indonesian social media by using the lexicon-based approach. Conclusion: A stratified random sampling technique has been conducted in 4 major Indonesian tribes to produce 250 abusive words for hate speech detection using the lexicon-based approach.
Tingkat kunjungan pariwisata ditahun 2021 baik lokal maupun mancanegara terhadap pariwisata Indonesia mengalami penurunan drastis. Pandemi COVID-19 menjadi salah satu sebab dari adanya kerugian tersebut. Dalam 1 tahun terakhir ini, tingkat pariwisata menurun drastis dikarenakan pandemi ini. Dampak terhadap sebuah negara adalah resesi ekonomi, Singapura adalah negara yang mengalami resesi cukup parah hingga -40%, negara adalah negara yang juga bergantung salah satunya pada pariwisata. Jatim Park Batu adalah sebuah pariwisata taman belajar dan tempat rekreasi keluarga di Batu, Jawa Timur. Jatim Park merupakan tergolong pariwisata yang terkenal di Jawa Timur. Ketidakpastian jumlah turis tiap bulannya mempengaruhi manajemen operasional Jatim Park dalam melakukan setiap pengambilan keputusan, baik keputusan yang bersifat teknis maupun strategis. Peneliti mengusulkan untuk menggunakan algoritma Triple Exponential Smoothing, model Holt Winters, dimana algoritma ini adalah tergolong algoritma prediksi yang dapat mempertimbangkan faktor trend dan musiman. Metode pengukuran akurasi menggunakan metode (Mean Absolute Percetage Error) MAPE. Pengujian dilakukan dengan inisiasi parameter alfa beta gamma sebanyak 30 kali dan didapatkan rata – rata sebesar 9%.
PurposeGathering knowledge regarding personality traits has long been the interest of academics and researchers in the fields of psychology and in computer science. Analyzing profile data from personal social media accounts reduces data collection time, as this method does not require users to fill any questionnaires. A pure natural language processing (NLP) approach can give decent results, and its reliability can be improved by combining it with machine learning (as shown by previous studies).Design/methodology/approachIn this, cleaning the dataset and extracting relevant potential features “as assessed by psychological experts” are essential, as Indonesians tend to mix formal words, non-formal words, slang and abbreviations when writing social media posts. For this article, raw data were derived from a predefined dominance, influence, stability and conscientious (DISC) quiz website, returning 316,967 tweets from 1,244 Twitter accounts “filtered to include only personal and Indonesian-language accounts”. Using a combination of NLP techniques and machine learning, the authors aim to develop a better approach and more robust model, especially for the Indonesian language.FindingsThe authors find that employing a SMOTETomek re-sampling technique and hyperparameter tuning boosts the model’s performance on formalized datasets by 57% (as measured through the F1-score).Originality/valueThe process of cleaning dataset and extracting relevant potential features assessed by psychological experts from it are essential because Indonesian people tend to mix formal words, non-formal words, slang words and abbreviations when writing tweets. Organic data derived from a predefined DISC quiz website resulting 1244 records of Twitter accounts and 316.967 tweets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.