Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2014
DOI: 10.1145/2554688.2554779
|View full text |Cite
|
Sign up to set email alerts
|

Square-rich fixed point polynomial evaluation on FPGAs

Abstract: Polynomial evaluation is important across a wide range of application domains, so significant work has been done on accelerating its computation. The conventional algorithm, referred to as Horner's rule, involves the least number of steps but can lead to increased latency due to serial computation. Parallel evaluation algorithms such as Estrin's method have shorter latency than Horner's rule, but achieve this at the expense of large hardware overhead. This paper presents an efficient polynomial evaluation algo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
6
3

Relationship

3
6

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 29 publications
(33 reference statements)
0
5
0
Order By: Relevance
“…We also compare the results of folding level 0, which can be considered equivalent to a single context FPGA, against the results on an Altera Stratix V (5SGSMD4E1H29C1) device that features 6-input fracturable LUTs and variable precision DSPs, as shown in Table 2. To account for the difference in platforms, we compute the effective area utilized by the implementation (in terms of equivalent LUTs) using the relation A Eff = LUT Max /DSP Max * DSPutilization + LUTutilization [13]. Also, the number of multiply/MAC and add/sub operations in each benchmark is shown (in brackets), which helps determine the reduction in DSP blocks achieved by exploiting their fracturable nature (on Stratix V and on the proposed DSP block in NATURE).…”
Section: Performance Results and Discussionmentioning
confidence: 99%
“…We also compare the results of folding level 0, which can be considered equivalent to a single context FPGA, against the results on an Altera Stratix V (5SGSMD4E1H29C1) device that features 6-input fracturable LUTs and variable precision DSPs, as shown in Table 2. To account for the difference in platforms, we compute the effective area utilized by the implementation (in terms of equivalent LUTs) using the relation A Eff = LUT Max /DSP Max * DSPutilization + LUTutilization [13]. Also, the number of multiply/MAC and add/sub operations in each benchmark is shown (in brackets), which helps determine the reduction in DSP blocks achieved by exploiting their fracturable nature (on Stratix V and on the proposed DSP block in NATURE).…”
Section: Performance Results and Discussionmentioning
confidence: 99%
“…This requires the datapaths to be manually tailored around the low-level structure of the DSP block, maximizing use of supported features. More general application to polynomial evaluation has also been proposed, again with detailed low-level optimization around DSP block structure [8]. The flexible DSP blocks in Xilinx FPGAs have also been exploited as the main functional unit in a soft processor [9].…”
Section: Related Workmentioning
confidence: 99%
“…DSP blocks are more power efficient, operate at a higher frequency, and consume less area than the equivalent operations implemented using the logic fabric. As such, they are heavily used in the pipelined datapaths of computationally intensive applications [de Dinechin and Pasca 2011;Xu et al 2014]. However, we have found that DSP block inference by the synthesis tools can be suboptimal [Ronak and Fahmy 2012] and the dynamic programmability feature is not mapped except in very restricted cases.…”
Section: Introductionmentioning
confidence: 99%