“…To increase data locality, TB through tiling techniques (Bandishti et al, 2012;Grosser et al, 2014b;Malas et al, 2015;Orozco and Gao, 2009;Strzodka et al, 2011;Wellein et al, 2009;Wonnacott, 2000;Yuan et al, 2017;Zhou, 2013) has been widely considered using various advanced programming models to favor asynchronous execution. Performance tuning using roofline models (Datta, 2009;Etienne et al, 2017;Nguyen et al, 2010;Titarenko and Hildyard, 2017) remains an important assessment step for stencil computations to ensure a good utilization of the underlying hardware resources. Some of these efforts have translated into software releases (e.g.…”