This paper shows that one can be competitive with the k-means objective while operating online. In this model, the algorithm receives vectors v 1 , . . . , v n one by one in an arbitrary order. For each vector v t the algorithm outputs a cluster identifier before receiving v t+1 . Our online algorithm generates O(k log n log γn) clusters whose expected k-means cost is O(W * log n). Here, W * is the optimal k-means cost using k clusters and γ is the aspect ratio of the data. The dependence on γ is shown to be unavoidable and tight. We also show that, experimentally, it is not much worse than kmeans++ while operating in a strictly more constrained computational model.
Decomposing a complex time series into trend, seasonality, and remainder components is an important primitive that facilitates time series anomaly detection, change point detection, and forecasting. Although numerous batch algorithms are known for time series decomposition, none operate well in an online scalable setting where high throughput and real-time response are paramount. In this paper, we propose OnlineSTL, a novel online algorithm for time series decomposition which is highly scalable and is deployed for real-time metrics monitoring on high-resolution, high-ingest rate data. Experiments on different synthetic and real world time series datasets demonstrate that OnlineSTL achieves orders of magnitude speedups (100x) for large seasonalities while maintaining quality of decomposition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.