Fuzzy Hoeffding Decision Tree for Data Stream Classification

Ducange, Pietro; Marcelloni, Francesco; Pecori, Riccardo

doi:10.2991/ijcis.d.210212.001

Cited by 24 publications

(13 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the other hand, the second group of data sets was built based on real occupancy data set, which is available in the repository of the University of California in Irvine (UCI) [17]. There are three files in the directory downloaded from the mentioned source: one is a training data set and the two others are test data sets [21]. Due to the fact that in this study the k-fold cross validation method was used, those files were merged, and then based on that, the test and training sets were created.…”

Section: Evaluation: Platform Data Sets Methods and Metricsmentioning

confidence: 99%

Variant of Data Particle Geometrical Divide for Imbalanced Data Sets Classification by the Example of Occupancy Detection

Rybak¹,

Dudczyk²

2021

Applied Sciences

View full text Add to dashboard Cite

The history of gravitational classification started in 1977. Over the years, the gravitational approaches have reached many extensions, which were adapted into different classification problems. This article is the next stage of the research concerning the algorithms of creating data particles by their geometrical divide. In the previous analyses it was established that the Geometrical Divide (GD) method outperforms the algorithm creating the data particles based on classes by a compound of 1 ÷ 1 cardinality. This occurs in the process of balanced data sets classification, in which class centroids are close to each other and the groups of objects, described by different labels, overlap. The purpose of the article was to examine the efficiency of the Geometrical Divide method in the unbalanced data sets classification, by the example of real case-occupancy detecting. In addition, in the paper, the concept of the Unequal Geometrical Divide (UGD) was developed. The evaluation of approaches was conducted on 26 unbalanced data sets-16 with the features of Moons and Circles data sets and 10 created based on real occupancy data set. In the experiment, the GD method and its unbalanced variant (UGD) as well as the 1CT1P approach, were compared. Each method was combined with three data particle mass determination algorithms-n-Mass Model (n-MM), Stochastic Learning Algorithm (SLA) and Bath-update Algorithm (BLA). k-fold cross validation method, precision, recall, F-measure, and number of used data particles were applied in the evaluation process. Obtained results showed that the methods based on geometrical divide outperform the 1CT1P approach in the imbalanced data sets classification. The article’s conclusion describes the observations and indicates the potential directions of further research and development of methods, which concern creating the data particle through its geometrical divide.

show abstract

Section: Evaluation: Platform Data Sets Methods and Metricsmentioning

confidence: 99%

Variant of Data Particle Geometrical Divide for Imbalanced Data Sets Classification by the Example of Occupancy Detection

Rybak¹,

Dudczyk²

2021

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…Moreover, fuzzy clustering has been preferred to hard clustering, due to its capability to better represent changes in data, which is a critical factor for stream data [1]. Indeed, for this reason, several extensions of fuzzy clustering algorithms have been proposed for data stream [10,17,27].…”

Section: Methodsmentioning

confidence: 99%

Balancing Data Within Incremental Semi-supervised Fuzzy Clustering for Credit Card Fraud Detection

Casalino

Castellano

Marvulli

2021

Atlantis Studies in Uncertainty Modelling

View full text Add to dashboard Cite

As the number of online financial transactions increases, the problem of credit card fraud detection has become quite urgent. Machine learning methods, including supervised and unsupervised approaches, have been proven to be effective to detect fraudulent activities. In our previous work presented at EUSFLAT2019 we proposed the use of an incremental semi-supervised fuzzy clustering that processes both labeled and unlabeled data as a stream to create a classification model for credit card fraud detection. However, we observed that the results of the method were affected by data unbalancement. Indeed credit card fraud data are highly imbalanced since the number of fraudulent activities is far less than the genuine ones. In this work, to deal with the high data unbalance, different resampling methods are investigated and their empirical comparison is reported.

show abstract

“…In this paper, we exploit two incremental decision trees suitable for data stream mining and classification, namely the Hoeffding Decision Tree (HDT) [19] and its fuzzy extension (Fuzzy Hoeffding Decision Tree -FHDT) introduced in [8] and deeply experimented in [9]. Although a tree is inherently interpretable, as stated above, fuzzy trees have been introduced because their usage of linguistic partitions on the attributes makes the resulting rules more explainable, given that each edge exiting a node can be associated with a proper and meaningful linguistic term.…”

Section: B Incremental Decision Treesmentioning

confidence: 99%

“…In fact, by using this strategy, only one rule is used to make a classification decision. More details on FHDTs can be found in [8] [9].…”

Section: B Incremental Decision Treesmentioning

confidence: 99%

“…In our car driver identification system, FDTs will support stakeholders to understand the motivations behind a specific decision and to obtain useful information about the profile of a specific driver. Moreover, we adopt an incremental version of FDTs, recently introduced in [8], [9], which allows us to continuously update the classification models, embedded in our system, while new trusted data streams, regarding a specific driver, arrive.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An Explainable and Evolving Car Driver Identification System based on Decision Trees

Bernardi

Cimitile

Ducange

et al. 2022

2022 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)

View full text Add to dashboard Cite

Shared mobility represents a more and more widespread model ensuring several advantages for citizens and reducing gas emissions. The birth of car-sharing models drives the necessity to use car monitoring systems able to reduce the possibility that unauthorized people drive a certain car. In this paper, we discuss the architecture of car driver identification systems based on incremental fuzzy decision trees. The main features of the proposed system are i) the explainability, namely the possibility of giving explanations regarding its decisions, provided in terms of linguistic rules, and ii) the possibility of continuously updating the classification model. We show the preliminary results of an experimental campaign in which we compare both fuzzy and non-fuzzy incremental decision trees, both in terms of classification performance and model complexity/explainability.

show abstract

Fuzzy Hoeffding Decision Tree for Data Stream Classification

Cited by 24 publications

References 39 publications

Variant of Data Particle Geometrical Divide for Imbalanced Data Sets Classification by the Example of Occupancy Detection

Variant of Data Particle Geometrical Divide for Imbalanced Data Sets Classification by the Example of Occupancy Detection

Balancing Data Within Incremental Semi-supervised Fuzzy Clustering for Credit Card Fraud Detection

An Explainable and Evolving Car Driver Identification System based on Decision Trees

Contact Info

Product

Resources

About