The electronic commerce site (EC site) has become an important marketing channel where consumers can purchase many kinds of products; their access logs, including purchase records and browsing histories, are saved in the EC sites' databases. These log data can be utilized for the purpose of web marketing. The customers who purchase many product items are good customers, whereas the other customers, who do not purchase many items, must not be good customers even if they browse many items. If the attributes of good customers and those of other customers are clarified, such information is valuable as input for making a new marketing strategy. Regarding the product items, the characteristics of good items that are bought by many users are valuable information. It is necessary to construct a method to efficiently analyze such characteristics. This paper proposes a new latent class model to analyze both purchasing and browsing histories to make latent item and user clusters. By applying the proposal, an example of data analysis on an EC site is demonstrated. Through the clusters obtained by the proposed latent class model and the classification rule by the decision tree model, new findings are extracted from the data of purchasing and browsing histories.
This paper discusses a new weighting method for text analyzing from the view point of supervised learning. The term frequency and inverse term frequency measure (tf-idf measure) is famous weighting method for information retrieval, and this method can be used for text analyzing either. However, it is an experimental weighting method for information retrieval whose effectiveness is not clarified from the theoretical viewpoints. Therefore, other effective weighting measure may be obtained for document classification problems. In this study, we propose the optimal weighting method for document classification problems from the view point of supervised learning. The proposed measure is more suitable for the text classification problem as used training data than the tf-idf measure. The effectiveness of our proposal is clarified by simulation experiments for the text classification problems of newspaper article and the customer review which is posted on the web site.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.