“…Given the evolution of CPU performance, where the processor clock speed is not increasing due to the power wall constraint, algorithmic speedups can now mainly come by exploiting parallelism [7,12,31,34,55,60,69,71,78,83,87,88]. This involves (i) parallelism across compute nodes (e.g., using Spark) [48,85], where the main goal is to scale to datasets that cannot be easily handled by a single node, and (ii) parallelism inside a single compute node (e.g., ex- 1 A data series, or data sequence, is an ordered sequence of data points.…”