Comments Mining With TF-IDF: The Inherent Bias and Its Removal

Yahav, Inbal; Shehory, Onn; Schwartz, David G.

doi:10.1109/tkde.2018.2840127

Cited by 82 publications

(45 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The research on the mining of attribute-opinion word pairs has attracted wide attention, mainly including the following three aspects: (a) The mining of attribute-opinion word pairs is regarded as a task of "keyword" extraction, and these keywords are extracted with unsupervised methods, for example, latent Dirichlet allocation (LDA), 12,13 TextRank, 14,15 and term frequencyinverse document frequency (TF-IDF). 16,17 However, those unsupervised methods have their limitations. That is only words can be accurately extracted rather than research contents in a certain context, and phrases cannot be analyzed.…”

Section: Attribute-opinion Pairs Miningmentioning

confidence: 99%

A decision‐making algorithm for online shopping using deep‐learning–based opinion pairs mining and q ‐rung orthopair fuzzy interaction Heronian mean operators

Yang

Ouyang

et al. 2020

Int J Intell Syst

View full text Add to dashboard Cite

In the process of online shopping, consumers usually compare the review information of the same product in different e‐commerce platforms. The sentiment orientation of online reviews from different platforms interactively influences on consumers’ purchase decision. However, due to the limitation of the ability to process information manually, it is difficult for a consumer to accurately identify the sentiment orientation of all reviews one by one and describe the process of their interactive influence. To this end, we proposed an online shopping support model using deep‐learning–based opinion mining and q‐rung orthopair fuzzy interaction weighted Heronian mean (q‐ROFIWHM) operators. First, in the proposed method, the deep‐learning model is used to automatically extract different product attribute words and opinion words from online reviews, and match the corresponding attribute‐opinion pairs; meanwhile, the sentiment dictionary is used to calculate sentiment orientation, including positive, negative, and neutral sentiments. Second, the proportions of the three kinds of sentiments about each attribute of the same product are calculated. According to the proportion value of attribute sentiment from different platforms, the sentiment information is converted into multiple cross‐decision matrices, which are represented by the q‐rung orthopair fuzzy set. Third, considering the interactive characteristics of decision matrix, the q‐ROFIWHM operators are proposed to aggregate this cross‐decision information, and then the ranking result was determined by score function to support consumers' purchase decisions. Finally, an actual example of mobile phone purchase is given to verify the rationality of the proposed method, and the sensitivity and the comparison analysis are used to show its effectiveness and superiority.

show abstract

Section: Attribute-opinion Pairs Miningmentioning

confidence: 99%

A decision‐making algorithm for online shopping using deep‐learning–based opinion pairs mining and q ‐rung orthopair fuzzy interaction Heronian mean operators

Yang

Ouyang

et al. 2020

Int J Intell Syst

View full text Add to dashboard Cite

show abstract

“…Algoritma TF-IDF digunakan untuk melakukan perhitungan bobot komentar dan mengklasifikasikannya ke dalam 2 kelas (komentar potensial dan komentar tidak potensial). Pembobotan TF-IDF umumnya digunakan dalam penambangan teks dan pencarian informasi untuk mengevaluasi pentingnya istilah linguistik (umumnya unigram atau bigram) dalam korpus yang diteliti [14]. Perhitungan TF-IDF menggunakan persamaan 1.…”

Section: Loginunclassified

Analisis Komentar Potensial pada Social Commerce Instagram Menggunakan TF-IDF

Musyarofah¹,

Utami²,

Raharjo³

2020

eksplora

View full text Add to dashboard Cite

Komentar di Instagram sangat berharga, informatif dan sangat membantu. Bagi penjual komentar adalah fitur yang menunjukkan respons pengguna Instagram terhadap produk yang ditawarkan, dan melalui fitur komentar penjual dapat menemukan pelanggan yang potensial. Manfaat tersebut diperoleh apabila penjual melakukan analisis pada komentar di toko Instagram-nya. Sangat dimungkinkan untuk menganalisis secara manual apabila data komentar pada tokonya berjumlah sedikit namun apabila komentar yang dimiliki banyak maka akan lebih cepat apabila menggunakan sistem. Banyaknya spam dapat mengganggu informasi yang ada pada komentar, sehingga tidak menjamin banyaknya komentar pada sebuah posting-an maka banyak pula yang ingin membeli produk tersebut. Oleh karena itu dibutuhkan sistem yang bisa memfilter komentar agar penjual dapat menemukan pelanggan yang tepat untuk produknya. Penelitian ini menggunakan algoritma TF-IDF untuk mengklasifikasikan komentar ke dalam 2 kelas (potensial dan tidak potensial) dan memperoleh akurasi sebesar 80%, presisi 0,76 dan recall 0,94. Berdasarkan hasil penelitian pada 294 komentar, 27% di antaranya adalah komentar tidak potensial. Kata yang menunjukkan minat beli seseorang adalah “berapa”, ”kak”, ”ada”, dan ”tidak”, sedangkan kata dominan pada komentar tidak potensial adalah kata “mention” yang menunjukkan aktivitas mention.

show abstract

“…TF adalah merupakan jumlah kemunculan setiap kata pada setiap dokumen dan IDF merepresentasikan jumlah dokumen yang memiliki kata tertentu berdasarkan jumlah kata dalam teks. Perhitungan TF-IDF dapat dilakukan dengan menggunakan rumus seperti pada persamaan 1 [12].…”

Section: Ekstraksi Fiturunclassified

Klasifikasi Topik Multi Label pada Hadis Shahih Bukhari Menggunakan K-Nearest Neighbor dan Latent Semantic Analysis

2020

View full text Add to dashboard Cite

Hadith is the second source of Islamic law after Al-Quran, making it important to study. However, there are some difficulties in learning hadith, such as to determine which hadith belongs to the topic of suggestions, prohibitions, and information. This certainly obstructs the hadith learning process, especially for Muslims. Therefore, it is necessary to classify hadiths into the topic of suggestions, prohibitions, information, and a combination of the three topics which also called as multi-label topic. The classification can be done with the K-Nearest Neighbor, it is one of the best methods in the Vector Space Model and is the simplest but quite effective method. However, the KNN has a lack in dealing with high vector dimension, resulting in the long time computing classification. For that reason, it is necessary to classify Sahih Bukhari's Hadiths into the topic of recommendations, prohibitions, and information using the Latent-Semantic Analysis - K-nearest Neighbor (LSA-KNN) method. Binary Relevance method is also employed in this research to process the multi-label data. This research shows that the performance of LSA-KNN is 90.28% with the computation time is 19 minutes 21 seconds and the performance of KNN is 90.23% with the computation time is 37 minutes 06 seconds, which means that the LSA-KNN method has a better performance than KNN

show abstract

Comments Mining With TF-IDF: The Inherent Bias and Its Removal

Cited by 82 publications

References 49 publications

A decision‐making algorithm for online shopping using deep‐learning–based opinion pairs mining and q ‐rung orthopair fuzzy interaction Heronian mean operators

A decision‐making algorithm for online shopping using deep‐learning–based opinion pairs mining and q ‐rung orthopair fuzzy interaction Heronian mean operators

Analisis Komentar Potensial pada Social Commerce Instagram Menggunakan TF-IDF

Klasifikasi Topik Multi Label pada Hadis Shahih Bukhari Menggunakan K-Nearest Neighbor dan Latent Semantic Analysis

Contact Info

Product

Resources

About