This article describes how association rule mining is used for extracting relations between items in transactional databases and is beneficial for decision-making. However, association rule mining can pose a threat to the privacy of the knowledge when the data is shared without hiding the confidential association rules of the data owner. One of the ways hiding an association rule from the database is to conceal the itemsets (co-occurring items) from which the sensitive association rules are generated. These sensitive itemsets are sanitized by the itemset hiding processes. Most of the existing solutions consider single support thresholds and assume that the databases are static, which is not true in real life. In this article, the authors propose a novel itemset hiding algorithm designed for the dynamic database environment and consider multiple itemset support thresholds. Performance comparisons of the algorithm is done with two dynamic algorithms on six different databases. Findings show that their dynamic algorithm is more efficient in terms of execution time and information loss and guarantees to hide all sensitive itemsets.
Purpose The purpose of this study is to develop a review rating prediction method based on a supervised text mining approach for unrated customer reviews. Design/methodology/approach Using 2,851 hotel comment card (HCC) reviews, this paper manually labeled positive and negative comments with seven aspects (dining, cleanliness, service, entertainment, price, public, room) that emerged from the content of said reviews. After text preprocessing (tokenization, eliminating punctuation, stemming, etc.), two classifier models were created for predicting the reviews’ sentiments and aspects. Thus, an aggregate rating scale was generated using these two classifier models to determine overall rating values. Findings A new algorithm, Comment Rate (CRate), based on supervised learning, is proposed. The results are compared with another review-rating algorithm called location based social matrix factorization (LBSMF) to check the consistency of the proposed algorithm. It is seen that the proposed algorithm can predict the sentiments better than LBSMF. The performance evaluation is performed on a real data set, and the results indicate that the CRate algorithm truly predicts the overall rating with ratio 80.27%. In addition, the CRate algorithm can generate an overall rating prediction scale for hotel management to automatically analyze customer reviews and understand the sentiment thereof. Research limitations/implications The review data were only collected from a resort hotel during a limited period. Therefore, this paper cannot explore the effect of independent variables on the dependent variable in context of larger period. Practical implications This paper provides a novel overall rating prediction technique allowing hotel management to improve their operations. With this feature, hotel management can evaluate guest feedback through HCCs more effectively and quickly. In this way, the hotel management will be able to identify those service areas that need to be developed faster and more effectively. In addition, this review rating prediction approach can be applied to customer reviews posted via online platforms for detecting review and rating reliability. Originality/value Manually analyzing textual information is time-consuming and can lead to measurement errors. Therefore, the primary contribution of this study is that although comment cards do not have rating values, the proposed CRate algorithm can predict the overall rating and understand the sentiment of the reviews in question.
Privacy preserving data mining (PPDM) is the process of protecting sensitive knowledge from being discovered by data mining techniques in case of data sharing. Privacy preserving frequent itemset mining (PPFIM) is a subtask and NP-hard problem of PPDM. Its objective is to modify a given database in such a way that none of the sensitive itemsets of the database owner can be obtained by any frequent itemset mining technique from the modified database. The main challenge of PPFIM is to minimize the distortion given to the data and nonsensitive knowledge while sanitizing all given sensitive itemsets. Distortion-based sensitive itemset hiding algorithms decrease the support of each sensitive itemset under a predefined sensitive threshold through sanitization. Most of the distortion-based itemset hiding algorithms allow database owner to define a single sensitive threshold for each sensitive itemset. However, this is a limitation to the database owner since the importance of each sensitive itemset varies. In this paper we propose a distortion-based itemset hiding algorithm that allows database owner to assign multiple sensitive thresholds, namely itemset oriented pseudo graph based sanitization (IPGBS) algorithm. The purpose of IPGBS algorithm is to give minimum distortion to the nonsensitive knowledge and data while hiding all sensitive itemsets. For this reason, the IPGBS algorithm modifies least amount of transaction and transaction content. The performance evaluation of the IPGBS algorithm is conducted by using two different counterparts on four different databases. The results show that the IPGBS algorithm is more efficient in terms of nonsensitive frequent itemset loss on both dense and sparse databases. It has considerable good results in terms of number of transactions modified, number of items deleted, execution time and total memory allocation as well.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.