“…In the 2010's, mainly due to the advent of 3D-stacking integration, this approach reappeared as a better fit for power and area constrained devices. However, this type of NDP requires innovative solutions for programming models, cache coherence, and virtual memory support [115,116,117]. Many works [7,15,16,32,52,67,68,69,72,74,79,80], proposed the use of custom FU-like logic to exploit the bandwidth of memories.…”