Square-rich fixed point polynomial evaluation on FPGAs

Xu, Simin; Fahmy, Suhaib A.; McLoughlin, Ian

doi:10.1145/2554688.2554779

Cited by 11 publications

(5 citation statements)

References 29 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We also compare the results of folding level 0, which can be considered equivalent to a single context FPGA, against the results on an Altera Stratix V (5SGSMD4E1H29C1) device that features 6-input fracturable LUTs and variable precision DSPs, as shown in Table 2. To account for the difference in platforms, we compute the effective area utilized by the implementation (in terms of equivalent LUTs) using the relation A Eff = LUT Max /DSP Max * DSPutilization + LUTutilization [13]. Also, the number of multiply/MAC and add/sub operations in each benchmark is shown (in brackets), which helps determine the reduction in DSP blocks achieved by exploiting their fracturable nature (on Stratix V and on the proposed DSP block in NATURE).…”

Section: Performance Results and Discussionmentioning

confidence: 99%

Fracturable DSP Block for Multi-context Reconfigurable Architectures

Warrier

Shreejith

Zhang

et al. 2016

Circuits Syst Signal Process

Self Cite

View full text Add to dashboard Cite

Multi-context architectures like NATURE enable low-power applications to leverage fast context switching for improved energy efficiency and lower area footprint. The NATURE architecture incorporates 16-bit reconfigurable DSP blocks for accelerating arithmetic computations, however, their fixed precision prevents efficient re-use in mixed-width arithmetic circuits. This paper presents an improved DSP block architecture for NATURE, with native support for temporal folding and run-time fracturability. The proposed DSP block can compute multiple sub-width operations in the same clock cycle and can dynamically switch between sub-width and full-width operations in different cycles. The NanoMap tool for mapping circuits onto NA-TURE is extended to exploit the fracturable multiplier unit incorporated in the DSP block. We demonstrate the efficiency of the proposed dynamically fracturable DSP block by implementing logic-intensive and compute-intensive benchmark applications. Our results illustrate that the fracturable DSP block can achieve a 53.7% reduction in DSP block utilization and a 42.5% reduction in area with a 122.5% reduction in power-delay product without exploiting logic folding. We also observe an average reduction of 6.43% in power-delay product for circuits that utilize NATURE's temporal folding compared to the existing full precision DSP block in NATURE, leading to highly compact, energy efficient designs.

show abstract

Section: Performance Results and Discussionmentioning

confidence: 99%

Fracturable DSP Block for Multi-context Reconfigurable Architectures

Warrier

Shreejith

Zhang

et al. 2016

Circuits Syst Signal Process

Self Cite

View full text Add to dashboard Cite

show abstract

“…This requires the datapaths to be manually tailored around the low-level structure of the DSP block, maximizing use of supported features. More general application to polynomial evaluation has also been proposed, again with detailed low-level optimization around DSP block structure [8]. The flexible DSP blocks in Xilinx FPGAs have also been exploited as the main functional unit in a soft processor [9].…”

Section: Related Workmentioning

confidence: 99%

Mapping for Maximum Performance on FPGA DSP Blocks

Ronak

Fahmy

2016

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

Self Cite

View full text Add to dashboard Cite

“…DSP blocks are more power efficient, operate at a higher frequency, and consume less area than the equivalent operations implemented using the logic fabric. As such, they are heavily used in the pipelined datapaths of computationally intensive applications [de Dinechin and Pasca 2011;Xu et al 2014]. However, we have found that DSP block inference by the synthesis tools can be suboptimal [Ronak and Fahmy 2012] and the dynamic programmability feature is not mapped except in very restricted cases.…”

Section: Introductionmentioning

confidence: 99%

The iDEA DSP Block-Based Soft Processor for FPGAs

Cheah

Brosser

Fahmy

et al. 2014

ACM Trans. Reconfigurable Technol. Syst.

Self Cite

View full text Add to dashboard Cite

DSP blocks in modern FPGAs can be used for a wide range of arithmetic functions, offering increased performance while saving logic resources for other uses. They have evolved to better support a plethora of signal processing tasks, meaning that in other application domains they may be underutilised. The DSP48E1 primitives in new Xilinx devices support dynamic programmability that can help extend their usefulness; the specific function of a DSP block can be modified on a cycle-by-cycle basis. However, the standard synthesis flow does not leverage this flexibility in the vast majority of cases. The lean DSP Extension Architecture (iDEA) presented in this article builds around the dynamic programmability of a single DSP48E1 primitive, with minimal additional logic to create a general-purpose processor supporting a full instruction-set architecture. The result is a very compact, fast processor that can execute a full gamut of general machine instructions. We show a number of simple applications compiled using an MIPS compiler and translated to the iDEA instruction set, comparing with a Xilinx MicroBlaze to show estimated performance figures. Being based on the DSP48E1, this processor can be deployed across next-generation Xilinx Artix-7, Kintex-7, Virtex-7, and Zynq families.

show abstract

Square-rich fixed point polynomial evaluation on FPGAs

Cited by 11 publications

References 29 publications

Fracturable DSP Block for Multi-context Reconfigurable Architectures

Fracturable DSP Block for Multi-context Reconfigurable Architectures

Mapping for Maximum Performance on FPGA DSP Blocks

The iDEA DSP Block-Based Soft Processor for FPGAs

Contact Info

Product

Resources

About