Random forests is currently one of the most used machine learning algorithms in the non-streaming (batch) setting. This preference is attributable to its high learning performance and low demands with respect to input preparation and hyper-parameter tuning. However, in the challenging context of evolving data streams, there is no random forests algorithm that can be considered state-of-the-art in comparison to bagging and boosting based algorithms.
Ensemble-based methods are among the most widely used techniques for data stream classification. Their popularity is attributable to their good performance in comparison to strong single learners while being relatively easy to deploy in real-world applications. Ensemble algorithms are especially useful for data stream learning as they can be integrated with drift detection algorithms and incorporate dynamic updates, such as selective removal or addition of classifiers. This work proposes a taxonomy for data stream ensemble learning as derived from reviewing over 60 algorithms. Important aspects such as combination, diversity, and dynamic updates, are thoroughly discussed. Additional contributions include a listing of popular open-source tools and a discussion about current data stream research challenges and how they relate to ensemble learning (big data streams, concept evolution, feature drifts, temporal dependencies, and others).
Data stream mining is a fast growing research topic due to the ubiquity of data in several real-world problems. Given their ephemeral nature, data stream sources are expected to undergo changes in data distribution, a phenomenon called concept drift. This paper focuses on one specific type of drift that has not yet been thoroughly studied, namely feature drift. Feature drift occurs whenever a subset of features becomes, or ceases to be, relevant to the learning task, thus, learners must detect and adapt to these changes accordingly. We survey existing work on feature drift adaptation in both explicit and implicit approaches. Additionally, we benchmark several algorithms and a naive proposal in synthetic and real-world datasets. The results from our experiments indicate the need for future research in this area as even naive approaches produced gains in accuracy while reducing resources usage. Finally, we state current research topics, challenges and future directions for feature drift adaptation.
Finding reliable partners to interact with in open environments is a challenging task for software agents, and trust and reputation mechanisms are used to handle this issue. From this viewpoint, we can observe the growing body of research on this subject, which indicates that these mechanisms can be considered key elements to design multiagent systems (MASs). Based on that, this article presents an extensive but not exhaustive review about the most significant trust and reputation models published over the past two decades, and hundreds of models were analyzed using two perspectives. The first one is a combination of trust dimensions and principles proposed by some relevant authors in the field, and the models are discussed using an MAS perspective. The second one is the discussion of these dimensions taking into account some types of interaction found in MASs, such as coalition, argumentation, negotiation, and recommendation. By these analyses, we aim to find significant relations between trust dimensions and types of interaction so it would be possible to construct MASs using the most relevant dimensions according to the types of interaction, which may help developers in the design of MASs.
This work encompasses the development of a new ensemble classifier that uses a Social Network abstraction for Data Stream Classification, namely the Social Adaptive Ensemble (SAE). In the context of data stream classification, concept drift is considered one of the most difficult and important issues to be addressed. Ensemble classifiers can be successfully applied to data streams as long as the ensemble efficiently adapts itself in the occurrence of a concept drift. SAE algorithm inherits strategies from other ensemble methods, such as Online Bagging [4] and DWM [2], and merge these with the notion of connectivity between similar classifiers w.r.t. their individual predictions. The relational data obtained through measuring similarities between classifiers is used to arrange ensemble members in a social network structure that allows us to identify subgroups (subnetworks) of highly similar classifiers. Being able to identify similar classifiers allows us to implement a combination strategy that first combines predictions within similar classifiers and later combines these into the final prediction. Moreover, this combination strategy assigns more weight to recently added classifiers predictions during concept drifts, since these are dissimilar to all other existing classifiers. The similarity between classifiers is also used to identify and remove redundant classifiers. This effectively saves systems resources and sometimes improves accuracy. We present empirical experiments with synthetic data streams containing abrupt, gradual and no drift showing that SAE is a valid option for stream classification, especially when data stream characteristics (e.g. presence of abrupt drifts) are previously unknown and system resources, such as CPU time and memory space, are a concern.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.