“…To address this challenge, existing performance modeling efforts [21,25,29] and machine learning approaches [4,18,28] have to tolerate huge offline training overhead to build an accurate online model for each framework, since they just consider low-level metrics (such as resource utilizations) within a framework. Sadly, they have to spend a lot of time to train new models for similar applications for new frameworks, although recent works [3,5,10] have proved that these similar applications, both in Hadoop and Spark, involve a wide range of use cases (micro benchmark, machine learning, stream processing and etc.).…”