“…Class 1b functions benefit from the NDP system, but primarily because of the lower memory access latency (and energy) that the NDP system provides for memory requests that need to be serviced by DRAM. These functions could benefit from other latency and energy reduction techniques, such as L2/L3 cache bypassing [151,189,190,205,247,269,354,356,365,378,395,396,403], low-latency DRAM [62-66, 75, 86, 163-165, 212, 236, 238-240, 256, 263, 271, 314, 352, 355, 358, 375, 417], and better memory access scheduling [24, 100, 102, 129, 173, 181, 221, 222, 275, 277, 294, 295, 343, 344, 384-387, 405, 412, 425, 438, 450]. However, they generally do not benefit significantly from prefetching (as seen in Figure 5(b)), since infrequent memory requests make it difficult for the prefetcher to successfully train on an access pattern.…”