2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing 2015
DOI: 10.1109/ccgrid.2015.114
|View full text |Cite
|
Sign up to set email alerts
|

A Deep Learning Prediction Process Accelerator Based FPGA

Abstract: Recently, machine learning is widely used in applications and cloud services. And as the emerging field of machine learning, deep learning shows excellent ability in solving complex learning problems. To give users better experience, high performance implementations of deep learning applications seem very important. As a common means to accelerate algorithms, FPGA has high performance, low power consumption, small size and other characteristics. So we use FPGA to design a deep learning accelerator, the acceler… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0
1

Year Published

2017
2017
2022
2022

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 46 publications
(20 citation statements)
references
References 10 publications
0
19
0
1
Order By: Relevance
“…In addition, because ML models require a high level of parallelism for efficient performance, throughput is a major issue. Memory throughput can be optimized by introducing pipelining [ 20 ].…”
Section: Challenges and Optimization Opportunities In Embedded Machine Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, because ML models require a high level of parallelism for efficient performance, throughput is a major issue. Memory throughput can be optimized by introducing pipelining [ 20 ].…”
Section: Challenges and Optimization Opportunities In Embedded Machine Learningmentioning
confidence: 99%
“…However, graphic processing units (GPUs), due to their high floating-point performance and thread-level parallelism, are more suitable for training deep learning models [ 13 ]. Extensive research is actively being carried out to develop suitable hardware acceleration units using FPGAs [ 20 , 21 , 22 , 23 , 24 , 25 , 26 ], GPUs, ASICs, and TPUs to create heterogeneous and sometimes distributed systems to meet up the high computational demand of deep learning models. At both the algorithm and hardware levels, optimization techniques for classical machine learning and deep learning algorithms are being investigated such as pruning, quantization, reduced precision, hardware acceleration, etc.…”
Section: Introductionmentioning
confidence: 99%
“…For P 2 multiplication operations, P 2 multiplication units are used to calculate them in parallel. A classic addition tree is generally used to calculate the sum of P 2 numbers [28]. The classic adder tree expands the number of addends from P 2 to 2 log 2 (P 2 ) by padding 0, then the sum of every two addends is passed onto the next stage as the input.…”
Section: Addition Unitmentioning
confidence: 99%
“…layers with different functions, which requires suitable hardware to accelerate its inference process. Meanwhile, many emerging fields, such as intelligent robots, unmanned aerial vehicles, autopilot cars and space probes, have imposed strict restrictions on power, delay and physical size of hardware accelerators, and traditional GPUs are hard to satisfy their requirements [7], [8]. To satisfy the above strict requirements, Field Programmable Gate Array (FPGA) has become a high performance and flexibility accelerator of CNN inference in many emerging fields [9]- [13].…”
Section: Introduction a Backgroundmentioning
confidence: 99%