Although the availability of sensor data is becoming prevalent across many domains, it still remains a challenge to make sense of the sensor data in an efficient and effective manner in order to provide users with relevant services. The concept of virtual sensors provides a step towards this goal, however they are often used to denote homogeneous types of data, generally retrieved from a predetermined group of sensors. The DIVS (Dynamic Intelligent Virtual Sensors) concept was introduced in previous work to extend and generalize the notion of a virtual sensor to a dynamic setting with heterogenous sensors. This paper introduces a refined version of the DIVS concept by integrating an interactive machine learning mechanism, which enables the system to take input from both the user and the physical world. The paper empirically validates some of the properties of the DIVS concept. In particular, we are concerned with the distribution of different budget allocations for labelled data, as well as proactive labelling user strategies. We report on results suggesting that a relatively good accuracy can be achieved despite a limited budget in an environment with dynamic sensor availability, while proactive labeling ensures further improvements in performance.
In interactive machine learning, human users and learning algorithms work together in order to solve challenging learning problems, e.g. with limited or no annotated data or trust issues. As annotating data can be costly, it is important to minimize the amount of annotated data needed for training while still getting a high classification accuracy. This is done by attempting to select the most informative data instances for training, where the amount of instances is limited by a labelling budget. In an online learning setting, the decision of whether or not to select an instance for labelling has to be done on-the-fly, as the data arrives in a sequential order and is only valid for a limited time period. We present a taxonomy of interactive online machine learning strategies. An interactive learning strategy determines which instances to label in an unlabelled dataset. In the taxonomy we differentiate between interactive learning strategies when the computer controls the learning process (active learning) and those when human users control the learning process (machine teaching). We then make a distinction between what triggers the learning: active learning could be triggered by uncertainty, time, or randomly, whereas machine teaching could be triggered by errors, state changes, time, or factors related to the user. We also illustrate the taxonomy by implementing versions of the different strategies and performing experiments on a benchmark dataset as well as on a synthetically generated dataset. The results show that the choice of interactive learning strategy affects performance, especially in the beginning of the online learning process, when there is a limited amount of labelled data.
No abstract
The advances in Internet of things lead to an increased number of devices generating and streaming data. These devices can be useful data sources for activity recognition by using machine learning. However, the set of available sensors may vary over time, e.g. due to mobility of the sensors and technical failures. Since the machine learning model uses the data streams from the sensors as input, it must be able to handle a varying number of input variables, i.e. that the feature space might change over time. Moreover, the labelled data necessary for the training is often costly to acquire. In active learning, the model is given a budget for requesting labels from an oracle, and aims to maximize accuracy by careful selection of what data instances to label. It is generally assumed that the role of the oracle only is to respond to queries and that it will always do so. In many real-world scenarios however, the oracle is a human user and the assumptions are simplifications that might not give a proper depiction of the setting. In this work we investigate different interactive machine learning strategies, out of which active learning is one, which explore the effects of an oracle that can be more proactive and factors that might influence a user to provide or withhold labels. We implement five interactive machine learning strategies as well as hybrid versions of them and evaluate them on two datasets. The results show that a more proactive user can improve the performance, especially when the user is influenced by the accuracy of earlier predictions. The experiments also highlight challenges related to evaluating performance when the set of classes is changing over time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.