“…A small level-0 data cache (L 0 ) [25], a different form of small buffer [12], or a L 0 only for critical loads [19] helps to reduce the load latency. However, since access to the L 0 requires going through normal pipeline stages to decode and generate addresses, it usually cannot achieve 0-cycle loads.…”