2018 IEEE 29th International Conference on Application-Specific Systems, Architectures and Processors (ASAP) 2018
DOI: 10.1109/asap.2018.8445087
|View full text |Cite
|
Sign up to set email alerts
|

Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutional neural networks

Abstract: Convolutional Neural Networks (CNNs) are widely used in deep learning applications, e.g. visual systems, robotics etc. However, existing software solutions are not efficient. Therefore, many hardware accelerators have been proposed optimizing performance, power and resource utilization of the implementation. Amongst existing solutions, Field Programmable Gate Array (FPGA) based architecture provides better cost-energyperformance trade-offs as well as scalability and minimizing development time. In this paper, … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 11 publications
0
2
0
Order By: Relevance
“…Traditional CPU designs are challenged in meeting the computational requirement of big data analytics applications. Accelerators for such applications on Field Programmable Logic Arrays (FPGAs) have become an attractive solution due to their massive parallelism, low power consumption, and costefficiency [1] [2]. Unfortunately, effective external DRAM memory bandwidth and access latency have become the bottleneck in such accelerators [3] [4].…”
Section: Introductionmentioning
confidence: 99%
“…Traditional CPU designs are challenged in meeting the computational requirement of big data analytics applications. Accelerators for such applications on Field Programmable Logic Arrays (FPGAs) have become an attractive solution due to their massive parallelism, low power consumption, and costefficiency [1] [2]. Unfortunately, effective external DRAM memory bandwidth and access latency have become the bottleneck in such accelerators [3] [4].…”
Section: Introductionmentioning
confidence: 99%
“…This makes them extremely demanding in terms of silicon real estate, especially memory, as well as compute performance and power. To bring this computation closer to the edge in resource-constrained devices, recently there has been considerable interest in building special-purpose hard-ware accelerators to support inference [3]- [6], training [7], [8] as well as compilers to bridge the gap between software simulation and hardware acceleration [9]. However, while microarchitectural techniques have been able to improve on the efficiency of neural network processing, it is nowhere near the biological neocortex, which is not only substantially deeper and wider but is also significantly more efficient in terms of energy and data [10].…”
Section: Introductionmentioning
confidence: 99%