Joan Santoso scite author profile

Abstract-News as one kind of information that is needed in daily life has been available on the internet. News website often categorizes their articles to each topic to help users access the news more easily. Document classification has widely used to do this automatically. The current availability of labeled training data is insufficient for the machine to create a good model. The problem in data annotation is that it requires a considerable cost and time to get sufficient quantity of labeled training data. A semi-supervised algorithm is proposed to solve this problem by using labeled and unlabeled data to create classification model. This paper proposes semi-supervised learning news classification system using Self-Training Naive Bayes algorithm. The feature that is used in text classification is Word2Vec Skip-Gram Model. This model is widely used in computational linguistics or text mining research as one of the methods in word representation. Word2Vec is used as a feature because it can bring the semantic meaning of the word in this classification task. The data used in this paper consists of 29,587 news documents from Indonesian online news websites. The Self-Training Naive Bayes algorithm achieved the highest F1-Score of 94.17%.Intisari-Berita sebagai salah satu jenis informasi yang dibutuhkan dalam kehidupan sehari-hari telah tersedia secara bebas di internet. Situs berita telah melakukan pengelompokan berita berdasarkan topiknya untuk mempermudah pengguna mencari berita yang dibutuhkan. Klasifikasi dokumen telah banyak digunakan untuk membantu pengelompokan berita secara otomatis. Kurang tersedianya data pelatihan yang cukup untuk digunakan komputer membentuk model klasifikasi yang baik sering menjadi kendala dalam implementasi di kasus nyata. Masalah utama dalam pelabelan data pelatihan agar diperoleh jumlah data yang cukup adalah perlunya biaya yang besar dan waktu yang cukup lama. Algoritme semi-supervised telah ditawarkan untuk menjawab permasalahan tersebut dengan menggunakan data berlabel dan tak berlabel dalam membentuk model klasifikasi yang dibutuhkan. Makalah ini mengusulkan sistem klasifikasi berita menggunakan semi-supervised learning dengan algoritme Self-Training Naive Bayes.

show abstract

Named entity recognition for extracting concept in ontology building on Indonesian language using end-to-end bidirectional long short term memory

Santoso

Setiawan

Purwanto

et al. 2021

Expert Systems with Applications

View full text Add to dashboard Cite

Klasifikasi Helpdesk Menggunakan Metode Support Vector Machine

Kusumahadi

Junaedi

Santoso

2019

jpit

View full text Add to dashboard Cite

The online helpdesk with ticketing system with the help of operators often experiences problems such as inappropriate delegation processes, the duration of the helpdesk waiting time to be delegated, even the helpdesk is missed to be handled. The ticket delegation checked manually by the operator has risks creating an error in delegating helpdesk tickets to inappropriate technicians. The helpdesk classification system is needed so that every incoming helpdesk ticket can be classified to the right technician according to the job description. The incoming Helpdesk is classified into 6 types of requests, namely multimedia, documentation, internet, server, hardware, software and miscellaneous. This helpdesk grouping is needed so that related technicians for each helpdesk can work and help the helpdesk according to their respective job descriptions. The Support Vector Machine method is used to classify text on the helpdesk. The use of Linear and Polynomial kernels produces an accuracy of 78%, the RBF or Gaussian kernel produces the highest accuracy of 81% while the Sigmoid kernel produces the smallest accuracy of 51%. The helpdesk classification results with the Support Vector Machine method can produce quite good accuracy.Abstrak  Helpdesk secara online dengan sistem ticketing dengan bantuan operator sering kali mengalami permasalahan seperti proses pendelegasian yang kurang tepat, lamanya waktu tunggu helpdesk didelegasikan, bahkan terlewatnya helpdesk untuk dapat ditangani. Proses delegasi tiket secara manual oleh operator beresiko menimbulkan terjadinya kesalahan pendelegasian tiket helpdesk kepada teknisi yang tidak sesuai. Sistem klasifikasi helpdesk dibutuhkan agar setiap tiket helpdesk yang masuk dapat diklasifikasikan dan didelegasikan ke teknisi yang tepat sesuai dengan job description. Helpdesk yang masuk diklasifikasi menjadi 6 macam permintaan bantuan yaitu multimedia, dokumentasi, internet, server, hardware, software. Pengelompokan helpdesk ini diperlukan agar teknisi terkait untuk masing-masing helpdesk dapat mengerjakan dan membantu helpdesk sesuai dengan job description masingmasing. Metode Support Vector Machine dipakai untuk melakukan klasifikasi teks pada helpdesk. Penggunaan kernel Linear dan Polynomial menghasilkan akurasi sebesar 78%, kernel RBF atau Gaussian menghasilkan akurasi paling tinggi yaitu sebesar 81% sedangkan kernel Sigmoid menghasilkan akurasi paling kecil yaitu 51%. Hasil klasifikasi helpdesk dengan metode Support Vector Machine dapat menghasilkan akurasi cukup baik.

show abstract

Bidirectional GRU for Targeted Aspect-Based Sentiment Analysis Based on Character-Enhanced Token-Embedding and Multi-Level Attention

Setiawan¹,

Ferry²,

Santoso³

et al. 2020

IJIES

View full text Add to dashboard Cite

The user's feedback on healthcare services is usually based on ratings from post-service questionnaires. However, in order to get a clear view of the user's perspective, online text reviews need to be analyzed. We combined targeted and aspect-based sentiment analysis by multi-level attention to get a specific user sentiment on a target of an aspect. The multi-level attention consists of Target-level and Sentence-level attention. Our proposed framework is based on Bidirectional Gated Recurrent Unit. Bi-GRU is commonly known to have comparable results compared to LSTM while having lesser computational complexity. We also utilized Bidirectional LSTM based Character-Enhanced Token-Embedding to handle out of vocabulary words and misspelling to avoid error in detecting sentiment. We created a dataset of online healthcare reviews from 2018-2020, targeting the name of the hospital or department, with ten aspects: cleanliness, cost, doctor, food, nurse, parking, receptionist and billing, safety, test and examination, and waiting time. To improve the results of our proposed method, we calculated polarity weight to handle imbalanced aspects in the dataset. We classified these reviews into three polarities, which are positive, negative, and neutral. Based on our experiments, we achieved the best F1-Score of 88%.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Joan Santoso

Large Scale Text Classification Using Map Reduce and Naive Bayes Algorithm for Domain Specified Ontology Building

Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa Indonesia

Named entity recognition for extracting concept in ontology building on Indonesian language using end-to-end bidirectional long short term memory

Klasifikasi Helpdesk Menggunakan Metode Support Vector Machine

Bidirectional GRU for Targeted Aspect-Based Sentiment Analysis Based on Character-Enhanced Token-Embedding and Multi-Level Attention

Contact Info

Product

Resources

About