Recording sensor data is seldom a perfect process. Failures in power, communication or storage can leave occasional blocks of data missing, affecting not only real-time monitoring but also compromising the quality of near- and off-line data analysis. Several recovery (imputation) algorithms have been proposed to replace missing blocks. Unfortunately, little is known about their relative performance, as existing comparisons are limited to either a small subset of relevant algorithms or to very few datasets or often both. Drawing general conclusions in this case remains a challenge. In this paper, we empirically compare twelve recovery algorithms using a novel benchmark. All but two of the algorithms were re-implemented in a uniform test environment. The benchmark gathers ten different datasets, which collectively represent a broad range of applications. Our benchmark allows us to fairly evaluate the strengths and weaknesses of each approach, and to recommend the best technique on a use-case basis. It also allows us to identify the limitations of the current body of algorithms and suggest future research directions.
With the emergence of the Internet of Things (IoT), time series streams have become ubiquitous in our daily life. Recording such data is rarely a perfect process, as sensor failures frequently occur, yielding occasional blocks of data that go missing in multiple time series. These missing blocks do not only affect real-time monitoring but also compromise the quality of online data analyses. Effective streaming recovery (imputation) techniques either have a quadratic runtime complexity, which is infeasible for any moderately sized data, or cannot recover more than one time series at a time. In this paper, we introduce a new online recovery technique to recover multiple time series streams in linear time. Our recovery technique implements a novel incremental version of the Centroid Decomposition technique and reduces its complexity from quadratic to linear. Using this incremental technique, missing blocks are efficiently recovered in a continuous manner based on previous recoveries. We formally prove the correctness of our new incremental computation, which yields an accurate recovery. Our experimental results on real-world time series show that our recovery technique is, on average, 30% more accurate than the state of the art while being vastly more efficient.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.