Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
20
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(21 citation statements)
references
References 21 publications
1
20
0
Order By: Relevance
“…3) Many overflow cases after adding error-reduction term. SIMD Accurate/Approximate Multiplier: Authors in [6,19] have shown performance/energy improvements in FPGA-based DNNs by modifying ASIC-based DSP block to perform double approximate multiplications with a common operand. Recently, [23] has proposed an approximate SIMD design (using 8x8 truncated multipliers) for ASIC platforms.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…3) Many overflow cases after adding error-reduction term. SIMD Accurate/Approximate Multiplier: Authors in [6,19] have shown performance/energy improvements in FPGA-based DNNs by modifying ASIC-based DSP block to perform double approximate multiplications with a common operand. Recently, [23] has proposed an approximate SIMD design (using 8x8 truncated multipliers) for ASIC platforms.…”
Section: Related Workmentioning
confidence: 99%
“…Nevertheless, in spite of their advantages, hosting off-the-shelf fixed-precision DSP blocks falls short on fulfilling design requirements in a variety of domains. Beside being unable to perform division, some shortcomings that testify on their inefficiency are: 1) their fixed locations in FPGAs impose routing complexity and often results in degraded performance of some circuits [17] (and Viterbi decoder, Reed-Solomon and JPEG encoders discussed in [30]); 2) unable to be efficiently-utilized for multiplication precision below 18-bit [6,19] (the comparable performance and better energy-efficiency of small-scale LUT-based multipliers over DSP blocks further encourages their deployment in e.g. neural networks) 3) their limited ratio versus LUTs (<0.001) in multiplication-intensive applications or concurrently executing programs.…”
Section: Introductionmentioning
confidence: 99%
“…The exploitation of fixed-point representation and quantization techniques to improve throughput and power performances of hardware CNN accelerators has been widely discussed [11][12][13][14]. Previous works have variously exploited the 16-[26-29,34] and 8-bit [16,30,31,[35][36][37] reduced precisions on both feature map values and parameters to realize efficient FPGA-based designs. The energy-efficient CNN accelerator proposed in [27] adopts a 16-bit representation to enable on-chip storage of parameters and partial results and to reduce power consumption associated with data transfers to/from the external memory.…”
Section: Background and Motivationsmentioning
confidence: 99%
“…Appropriately reducing the data precision to 8 bits also allows the high-performance DSP blocks to be exploited to realize fast double [16,[35][36][37] MAC architectures. Unfortunately, the latter require auxiliary operations to correct the output of each DSP block used to perform multiplications, thus causing detrimental effects on speed performances, resources requirements and power consumption.…”
Section: Background and Motivationsmentioning
confidence: 99%
See 1 more Smart Citation