Abstract:The most important element in analyzing sentiment in text is to assign polarity to the opinion words. Polarity means the positive, negative or neutral state of the opinion words. They are many methods or ways in determining the polarity of an opinion words. One of the methods is using lexicon-based method. Lexicons are digital library of opinion words together with the polarity of the words. Basically, there are 3 methods in developing lexicon-based approach which is manual, dictionary-based and corpus-based. For Malay language there is no available sentiment lexicon and also very limited sources. Thus, in this study we present the automation lexicon generation for Malay language using the dictionary approach. The detail description of the automation lexicon generation for Malay language is discussed in this study.
Extensive development of web 2.0 has led to production of gigantic amount of user generated data. These data consist of many useful information. Manual analyzing these data and classifying sentiment in them, is an exhausting task, thus opinion mining method is needed. Opinion mining approach uses natural language processing where Part-ofSpeech (POS) Tagging is a crucial part. The performance of any NLP system depends on the accuracy of a POS tagger. Two main issues that affect the accuracy of POS tagger are unknown words and ambiguity. Although research on POS tagging has been back dated few decades ago, yet they have been mostly focused on English. Research on Malay language is still in the early stage. Also, online Malay Text differs from proper Malay text, in the sense of structure and also grammar. Online users tend use a lot of abbreviations and short forms in their text. Besides this, the "BahasaRojak" phenomena complicate tagging process even further. Thus taking all these into consideration, in this study, we will review stochastic and rule-based POS tagging methodologies to deal with ambiguous and unknown words on online Malay text.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.