Global society has experienced a flood of various types of data, as well as a growing desire to discover and use this information effectively. Moreover, this data is changing in increasingly numerousRecent developments in information processing techniques have enabled us to accumulate largescale data. The need for discovering and utilizing useful information in this data is growing. Because of this, data mining, which is a technology used to collect data to discover useful information, has attracted considerable attention. However, with the spread of the Internet and the development of sensor techniques, the complexity of this data is constantly changing, and the increasing amounts of data must be handled on a real-time basis. New knowledge-streammining techniques are required to process such large-scale data that arrives intermittently and at different intervals as data stream flows. Stream mining uses various analytical methods; in particular, classification learning is gaining considerable attention. Many classification learning methods have been proposed among which the decision tree learning method is commonly used, because it is fast and the derived description of classifiers is easily interpreted. One of the data streams that supports the decision tree learning method is called the Very Fast Decision Tree (VFDT) [1]. As data arrives, this data stream grows gradually while the data is classified. Credit card transaction data is considered as the data stream. Therefore, it is possible to detect fraudulent use by classifying transaction data using the VFDT. However, among the various data types, there are some data, such as the credit card transaction data discussed in this study, whose characteristics are extremely different. When such data is used in a data stream, some problems can reduce the accuracy of the VFDT [2,3].In this study, we propose a node construction algorithm that is applicable to imbalanced distribution data streams. We also implement and evaluate criteria for constructing nodes. This paper is organized as follows. First, in Section 2, we explain the VFDT. In Section 3, we describe our proposed method, which consists of a VFDT construction from imbalanced distribution data streams. In Section 4, we verify the effectiveness of the proposed method by experiments. In Section 5, we describe and consider the experimental result. In the final section, we conclude and discuss our future works. Related worksClassification is one of the most common tasks in data mining. The main classification methods that currently exist include decision trees, neural networks, logistic regression, nearest neighbors, and support vector machines.Decision trees are recognized as very effective and attractive classification tools, mainly because
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.