“…In comparison, Grade10 offers more fine-grained resources utilization by upsamling to individual execution phases for single-workload bottleneck detection. Closest to Grade10 is work from Tian et al [11] which also does performance characterization using a DAG-based computation model with system-level resource monitoring. However, Grade10 captures a more comprehensive set of performance issues (e.g., including burstiness, imbalance), does more fine-grained attribution across time and execution phases (in comparison to coarser machine learning based attribution used by Tian et al), and is more thoroughly evaluated with two state-of-the-art graph frameworks, 2 datasets, and 4 algorithms (than just two workloads by Tian et al).…”