This paper presents a novel ensemble learning method based on evolutionary algorithms to cope with different types of concept drifts in non-stationary data stream classification tasks. In ensemble learning, multiple learners forming an ensemble are trained to obtain a better predictive performance compared to that of a single learner, especially in non-stationary environments, where data evolve over time. The evolution of data streams can be viewed as a problem of changing environment, and evolutionary algorithms offer a natural solution to this problem. The method proposed in this paper uses random subspaces of features from a pool of features to create different classification types in the ensemble. Each such type consists of a limited number of classifiers (decision trees) that have been built at different times over the data stream. An evolutionary algorithm (replicator dynamics) is used to adapt to different concept drifts; it allows the types with a higher performance to increase and those with a lower performance to decrease in size. Genetic algorithm is then applied to build a two-layer architecture based on the proposed technique to dynamically optimise the combination of features in each type to achieve a better adaptation to new concepts. The proposed method, called EACD, offers both implicit and explicit mechanisms to deal with concept drifts. A set of experiments employing four artificial and five real-world data streams is conducted to compare its performance with that of the state-of-the-art algorithms using the immediate and delayed prequential evaluation methods. The results demonstrate favourable performance of the proposed EACD method in different environments.
Data stream classification techniques have been playing an important role in big data analytics recently due to their diverse applications (e.g. fraud and intrusion detection, forecasting and healthcare monitoring systems) and the growing number of real-world data stream generators (e.g. IoT devices and sensors, websites and social network feeds). Streaming data is often prone to evolution over time. In this context, the main challenge for computational models is to adapt to changes, known as concept drifts, using data mining and optimisation techniques. We present a novel ensemble technique called RED-PSO that seamlessly adapts to different concept drifts in non-stationary data stream classification tasks. RED-PSO is based on a three-layer architecture to produce classification types of different size, each created by randomly selecting a certain percentage of features from a pool of features of the target data stream. An evolutionary algorithm, namely, Replicator Dynamics (RD), is used to seamlessly adapt to different concept drifts; it allows good performing types to grow and poor performing ones to shrink in size. In addition, the selected feature combinations in all classification types are optimised using a non-canonical version of the Particle Swarm Optimisation (PSO) technique for each layer individually. PSO allows the types in each layer to go towards local (within the same type) $ Fully documented templates are available in the elsarticle package on CTAN.
The extensive growth of digital technologies such as the Internet of Things (IoT), social media networks and forecasting systems has led to new challenges regarding computational complexity and big data mining. The classification task in such applications is not trivial due to the high volume of related data and limited time available for the task. It is particularly difficult when dealing with data streams, where each instance of data is typically processed once on its arrival (i.e. online) while the underlying data distribution often changes due to the changing environment. In this paper, we propose a novel ensemblebased framework called Replicator Dynamics & Genetic Algorithms Approach (RED-GENE) for effective data stream classification in the context of changing environment leading to concept drifts (i.e. evolution of data streams). RED-GENE employs three novel Replicator Dynamics (RD) strategies along with a Genetic Algorithm (GA) optimisation technique to flexibly adapt to different types of concept drifts when performing data stream classification tasks. The proposed framework works as follows. First, a set of random feature combinations is drawn from a given pool of features of the target data stream to create different classification types. Next, RD is used to allow the classification types achieving higher classification accuracy to grow and those with lower accuracy to shrink. A modified version of the classic GA is then employed to optimise the randomly drawn combinations of features in each classification type. The proposed framework was tested using nine data streams (including both real-world and synthetic datasets) to investigate different variations of the proposed framework and compare its performance to other state-of-the-art algorithms using immediate and delayed prequential evaluation methods. The results demonstrated that the proposed framework can provide the best accuracy on average when comparing to five other state-of-the-art algorithms.
Data stream classification is the process of learning supervised models from continuous labelled examples in the form of an infinite stream that, in most cases, can be read only once by the data mining algorithm. One of the most challenging problems in this process is how to learn such models in non-stationary environments, where the data/class distribution evolves over time. This phenomenon is called concept drift. Ensemble learning techniques have been proven effective adapting to concept drifts. Ensemble learning is the process of learning a number of classifiers, and combining them to predict incoming data using a combination rule. These techniques should incrementally process and learn from existing data in a limited memory and time to predict incoming instances and also to cope with different types of concept drifts including incremental, gradual, abrupt or recurring. A sheer number of applications can benefit from data stream classification from non-stationary data, including weather forecasting, stock market analysis, spam filtering systems, credit card fraud detection, traffic monitoring, sensor data analysis in Internet of Things (IoT) networks, to mention a few. Since each application has its own characteristics and conditions, it is difficult to introduce a single approach that would be suitable for all problem domains. This chapter studies ensembles' dynamic behaviour of existing ensemble methods (e.g. addition, removal and update of classifiers) in non-stationary data stream classification. It proposes a new, compact, yet informative formalisation of state-of-the-art methods. The chapter also presents results of our experiments comparing a diverse selection of best performing algo-
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.