“…First, most NDP architectures [8, 19, 25, 38, 42-46, 49, 55, 67, 98, 110, 111, 113, 119, 155, 158] lack shared caches that can enable low-cost communication and synchronization among NDP cores of the system. Second, hardware cache coherence protocols are typically not supported in NDP systems [8,19,25,38,[42][43][44][45]49,55,67,82,98,111,119,155,158], due to high area and traffic overheads associated with such protocols [46,143]. Third, NDP systems are non-uniform, distributed architectures, in which inter-unit communication is more expensive (both in performance and energy) than intraunit communication [8,20,21,38,43,83,155,158].…”