As the proliferation of constant data feeds increases from social media, embedded sensors, and other sources, the capability to provide predictive concept labels to these data streams will become ever more important and lucrative. However, the dynamic, non-stationary nature, and effectively infinite length of data streams pose additional challenges for stream data mining algorithms. The sparse quantity of training data also limits the use of algorithms that are heavily dependent on supervised training. To address all these issues, we propose an incremental semi-supervised method that provides general concept class label predictions, but it also tracks concept clusters within the feature space using an innovative new online clustering algorithm. Each concept cluster contains an embedded stream classifier, creating a diverse ensemble for data instance classification within the generative model used for detecting emerging concepts in the stream. Unlike other recent novel class detection methods, our method goes beyond detecting, and continues to differentiate and track the emerging concepts. We show the effectiveness of our method on several synthetic and real world data sets, and we compare the results against other leading baseline methods.
Classification of data points in a data stream is a fundamentally different set of challenges than data mining on static data. While streaming data is often placed into the context of "Big Data"(or more specifically "Fast Data") wherein one-pass algorithms are used, true data streams offer additional hurdles due to their dynamic, evolving, and non-stationary nature. During the stream, the available labels (or concepts) often change, and a concept's definition in the feature space can also evolve (or drift) over time. The core issue is that the hidden generative function of the data is not a constant function, but rather evolves over time. This is known as a non-stationary distribution. In this paper, we describe a new approach to using ensembles for stream classification. While the core method is straightforward, it is specifically designed to adapt quickly with very little overhead to the dynamic and evolving nature of data streams generated from non-stationary functions. Our method, M 3 , is based on a weighted majority ensemble of heterogeneous model types where model weights are updated on-line using Reinforcement Learning techniques. We compare our method with current leading algorithms as implemented in the Massive Online Analysis (MOA) framework using UCI benchmark and synthetic stream generator data sets, and find that our method shows particularly strong gain over the baseline method when ground truth is of limited availability to the classifiers.
As ISR sensors proliferate in theater, the autonomous and dynamic control of those sensors, especially those on board unmanned aircraft systems, will become a necessity as personnel and resource constraints drive the shift in operator control from manual tasking and prioritization to system supervision.This paper explores the experimentation, analysis, and viability of a decentralized, auction-less market-based algorithm to fulfill such a need.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.