“…These operations can be implemented in a pipeline, as shown in Table II, maximizing the achievable clock frequency of the system. Additionally, these intermediate registers allow the FPGA compiler to implement the multiplications in a more efficient manner, taking advantage of DSP blocks, i.e., hardware specifically intended for these types of operations [15]. The downside of this approach is a 3 clock cycle delay on the output, which is acceptable in the majority of cases.…”