Arabic Book Retrieval using Class and Book Index Based Term Weighting

Tibyani

et al. 2018

IJECE

Self Cite

2013 curriculum is a new curriculum in the Indonesian education system which has been enacted by the government to replace KTSP curriculum. The implementation of this curriculum in the last few years has sparked various opinions among students, teachers, and public in general, especially on social media twitter. In this study, a sentimental analysis on 2013 curriculum is conducted. Ensemble of several feature sets were used twitter specific features, textual features, Parts of Speech (POS) features, lexicon based features, and Bag of Words (BOW) features for the sentiment classification using K-Nearest Neighbor method. The experiment result showed that the the ensemble features have the best performance of sentiment classification compared to only using individual features. The best accuracy using ensemble features is 96% when k=5 is used.

Section: Methodsmentioning

confidence: 99%

Twitter Sentiment Analysis on 2013 Curriculum Using Ensemble Features and K-Nearest Neighbor

Irfan

Tibyani

et al. 2018

IJECE

Self Cite

“…This process aims to separate each word to distinguish certain characters that are treated as word separators or not. The tokenizing process relies on the space character in the document as a word separator [15]. b) Filtering…”

Section: Text Preprocessingmentioning

confidence: 99%

Cyberbullying identification in twitter using support vector machine and information gain based feature selection

Purnamasari

Indriati

et al. 2020

IJEECS

Self Cite

Cyberbullying is one of the actions that violate the ITE Law where the crime is committed on social media applications such as Twitter. This action is difficult to detect if no one is reporting the tweet. Cyberbullying tweet identification aims to classify tweets that contain bullying. Classification is done using Support Vector Machine method where this method aims to find the dividing hyperplane between negative and positive class. This study is a text classification where more data is used, the more features are produced, therefore this research also uses Information Gain as feature selection to select features that are not relevant to the classification. The process of the system starts from text preprocessing with tokenizing, filtering, stemming and term weighting. Then perform the information gain feature selection by calculating the entropy value of each term. After that perform the classification process based on the terms that have been selected, and the output of the system is identification whether the tweet is bullying or not. The result of using SVM method is accuracy 75%, precision 70.27%, recall 86.66% and f-measure 77.61% on experiment maximum iteration = 20, λ = 0.5, γ = 0.001, ε = 0.000001, and C = 1. The best threshold of information gain is 90%, with accuracy 76.66%, precision 72.22%, recall 86.66% and f-measure 78.78%.

“…Preprocessing is conducted before the main process begin. Some steps conducted in this stage including tokenization, case folding and cleaning [40][41][42][43]. In tokenization, each review is splitted into smaller units called tokens or terms [44].…”

Section: Preprocessingmentioning

confidence: 99%

Word2Vec model for sentiment analysis of product reviews in Indonesian language

2019

IJECE

Self Cite

Online product reviews have become a source of greatly valuable information for consumers in making purchase decisions and producers to improve their product and marketing strategies. However, it becomes more and more difficult for people to understand and evaluate what the general opinion about a particular product in manual way since the number of reviews available increases. Hence, the automatic way is preferred. One of the most popular techniques is using machine learning approach such as Support Vector Machine (SVM). In this study, we explore the use of Word2Vec model as features in the SVM based sentiment analysis of product reviews in Indonesian language. The experiment result show that SVM can performs well on the sentiment classification task using any model used. However, the Word2vec model has the lowest accuracy (only 0.70), compared to other baseline method including Bag of Words model using Binary TF, Raw TF, and TF.IDF. This is because only small dataset used to train the Word2Vec model. Word2Vec need large examples to learn the word representation and place similar words into closer position.