This paper proposes a new deterministic branch prediction unit to achieve a uniformly timed instruction set architecture (ISA). The deterministic ISA is achieved by utilizing two address buses in conjunction with dual-port block RAMs that are common in commercial FPGAs. The goal is to remove mandatory branch and load delays to achieve a uniform one clock cycle per every instruction. To demonstrate the concept, the proposed architecture is applied to the Xilinx PicoBlaze firm core. The result is a new soft core named DAP-Zipi8 that reduces the clock per instruction (CPI) metric of PicoBlaze from two to one at the expense of extra logic and a longer critical path. The increased critical path reduces maximum achievable clock speed from 357.509 MHz to 224.022 MHz. Merging the gain in CPI with the loss in maximum clock frequency still improves overall processor performance by 18.28–19.49%. The high-performance deterministic DAP-Zipi8 is a viable choice for hard RTES applications.
In this paper, a hardware computing unit has been designed and implemented. This unit computes many elementary functions (such as sine, cosine, tan-1 , sinh, cosh, and square root) that their computing by using software systems requires thousands of clock cycles as an execution time. The architecture of the function computation has been designed by using VHDL and placed on XC3S500E FPGA chip in Spartan 3E as a target technique. In this paper, two algorithms have been used in computing the mathematical functions, because they can be implemented using FPGA chip. The first is the Coordinate Rotation Digital Computer algorithm (CORDIC) which was introduced in 1959. It is a single unified algorithm for calculating many elementary functions including trigonometric, hyperbolic, logarithmic and exponential functions, multiplication, division and square root. The second one uses the lookup table. According to the self-similarity in the trigonometric functions, and using the techniques of parallel pipelining for the CORDIC algorithm, speedup of (24.7-30.3)×100% is obtained as compared with the other parallel architectures. The throughput became operation/clock pulse except the first operation whose latency was 32 clock pulse.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.