“…By encapsulating the loop body and loads/stores in different functions, the HLS compiler is able to schedule calls without dependencies in the same cycle. In the example [8], a2 [8]; float b1 [8], b2 [8]; float c1 [8], c2 [8]; load(a1,b1,a,b); //n is multiple of 2 and n >= 2 for (int k = 0; k < n-2; ++k) { loadStore(a2,b2,c2,a+k * 8,b+k * 8,c+(k-1) * 8,k); loopBody(a1,b1,c1); ++k; loadStore(a1,b1,c1,a+k * 8,b+k * 8,c+(k-1) * 8,k); loopBody(a2,b2,c2); } int k = n-1; loadStore(a2,b2,c2,a+k * 8,b+k * 8,c+(k-1) * 8,k); loopBody(a1,b1,c1); store(c1,c+k * 8); loopBody(a2,b2,c2); store(c1,c+k * 8); } Listing 3: Proposal of OmpSs pragma syntax (vectorAdd) and generated Vivado HLS code (vectorAddTransformed) to pipeline loads/stores with computation of listing 3, the first loadStore function call of vectorAddTransformed is scheduled alongside the first loopBody call. The other two calls are also scheduled together after the first two.…”