Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2017
DOI: 10.1145/3020078.3021744
|View full text |Cite
|
Sign up to set email alerts
|

Finn

Abstract: Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present Finn, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture. By utilizing a novel set of optimizations that enable efficient mapping of binarized neural networks to hardware, we implement fully connected, conv… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
76
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 714 publications
(123 citation statements)
references
References 22 publications
0
76
0
Order By: Relevance
“…The hardware metrics of the residual network accelerator (RNA) are shown in Table 3, while the detials of resources utilization for each module are shown in [14] and [24], our accelerator with 4-bit logarithmic parameters outperforms the 16-bit fixed-point accelerators in terms of performance and accuracy, with 3 − 7× speedup and 1% − 4% accuracy gain, proving that our acceleration can effectively accelerate the residual networks. Some high throughput accelerators from [22] and [16], performing binarized neural network, outperform our accelerator in performance but cause significant accuracy reduction. Moreover, our accelerator can achieve [23], [25].…”
Section: Performance Of Rna Implementationmentioning
confidence: 97%
“…The hardware metrics of the residual network accelerator (RNA) are shown in Table 3, while the detials of resources utilization for each module are shown in [14] and [24], our accelerator with 4-bit logarithmic parameters outperforms the 16-bit fixed-point accelerators in terms of performance and accuracy, with 3 − 7× speedup and 1% − 4% accuracy gain, proving that our acceleration can effectively accelerate the residual networks. Some high throughput accelerators from [22] and [16], performing binarized neural network, outperform our accelerator in performance but cause significant accuracy reduction. Moreover, our accelerator can achieve [23], [25].…”
Section: Performance Of Rna Implementationmentioning
confidence: 97%
“…The benefits, in terms of power and throughput, of fitting a design on-chip was Example of a reconfigurable multiplier with the coefficient set {12305, 20746}. described in [3]. Other FPGA architectures have been implemented to utilize the highly amenable nature of CNNs which constrain weight parameters to be only binary or ternary representations [29], [30].…”
Section: Related Workmentioning
confidence: 99%
“…m itself is bounded by its arbitrary unsigned 2's complement fixed-point representation where f is the number of fractional bits and hence m = 2 k− f − 2 − f . A summary of the training process is given in Algorithm 1, which is similar to [3] and [6], with the addition of distribution matching and incorporating the quantization scheme of (5).…”
Section: Algorithm 1 Training a Cnn Using Addnet Representationsmentioning
confidence: 99%
See 2 more Smart Citations