Unavailability of functional units is a major performance bottleneck in general-purpose processors (GPP). In a GPP with limited number of functional units while a functional unit may be heavily utilized at times, creating a performance bottleneck, the other functional units might be under-utilized. We propose a novel idea for adapting functional units in GPP architecture in order to overcome this challenge. For this purpose, a selected set of complex functional units that might be under-utilized such as multiplier and divider, are realized using a programmable look up table-based fabric. This allows for run-time adaptation of functional units to improving performance. The programmable look up tables are realized using magnetic tunnel junction (MTJ) based memories that dissipate near zero leakage and are CMOS compatible. We have applied this idea to a dual issue architecture. The results show that compared to a design with all CMOS functional units a performance improvement of 18%, on average is achieved for standard benchmarks. This comes with 4.1% power increase in integer benchmarks and 2.3% power decrease in floating point benchmarks, compared to a CMOS design.
Unavailability of functional units and their unequal activity makes performance bottlenecks and thermal hot spot units in general-purpose processors. We propose to use reconfigurable functional units to overcome these challenges. A selected set of complex functional units that might be underutilized, such as a multiplier and divider, are realized in a timemultiplexed fashion using a shared programmable Look Up Table (LUT) based fabric. This allows for run-time reconfiguration and migration of their activity. LUT based implementation also allows under-utilized functional units to be dynamically reconfigured to the functional units that have a performance bottleneck and hence improving performance. The programmable LUTs are realized using Spin Transfer Torque (STT) Magnetic technology (also called STT-NV) due to its zero leakage and CMOS compatibility. The results show significant performance improvement of 16% on average across standard benchmarks, when replacing CMOS multiplier and divider with reconfigurable STT-NV LUT counterpart. In addition, reconfiguration reduces the maximum temperature of functional units by up to 27 o C and almost eliminates the thermal variation across them. This comes with small power overhead and no area impact.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.