2021
DOI: 10.1109/tcad.2020.2994256
|View full text |Cite
|
Sign up to set email alerts
|

CNN-on-AWS: Efficient Allocation of Multikernel Applications on Multi-FPGA Platforms

Abstract: Multi-FPGA platforms, like Amazon AWS F1, can run in the cloud multi-kernel pipelined applications, like Convolutional Neural Networks (CNNs), with excellent performance and lower energy consumption than CPUs or GPUs. We propose a method to efficiently map these applications on multi-FPGA platforms to maximize the application throughput. Our methodology finds, for the given resources, the optimal number of parallel instances of each kernel in the pipeline and their allocation to one or more among the available… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 14 publications
(8 citation statements)
references
References 17 publications
0
6
0
Order By: Relevance
“…If the number of elements in S BIT is n and the number of newly generated rectangles is N NEW , then N NEW satisfies N NEW ≤ 2n − 1. A combination of the two edges that satisfy E i < E j produces a new rectangle whose side length can be expressed as Formulation (4).…”
Section: Build the Sequence Of Algorithmmentioning
confidence: 99%
See 1 more Smart Citation
“…If the number of elements in S BIT is n and the number of newly generated rectangles is N NEW , then N NEW satisfies N NEW ≤ 2n − 1. A combination of the two edges that satisfy E i < E j produces a new rectangle whose side length can be expressed as Formulation (4).…”
Section: Build the Sequence Of Algorithmmentioning
confidence: 99%
“…Field programmable gate arrays (FPGAs) are gradually replacing X86 or GPU in high performance computing platforms in resource-constrained environments due to their low power consumption, high parallelism, and fast computing speed [1,2]. As applications become larger and more complex, system-on-chip (SoC) architectures consisting of multiple FPGAs (Multi-FPGA) that combine faster inter-chip interconnections to form larger, more computationally intensive units have become popular [3,4]. In addition, the Dynamic Partial Reconfiguration (DPR) technology of FPGA allows the runtime to dynamically configure tasks to different reconfigurable partitions [5], further increasing the flexibility of Multi-FPGA systems and virtually increasing the availability of hardware resources [6,7].…”
Section: Introductionmentioning
confidence: 99%
“…Shan et al introduce [172] a CNN multi-kernel application and its implementation on AWS-F1, where an analytical model is used to compute data transfers (CPU to DDR, DDR to FPGA, FPGA to DDR, and DDR to CPU) and kernel computation times.…”
Section: A Modelsmentioning
confidence: 99%
“…While the computation time remains constant for up to A max antennas, the time required to transfer all the coefficients from the host to the DDR memories via the PCIe bus grows proportionally to the number of FPGAs because of the inevitable data duplication [30]. (As the initial values of E and H fields are zero, there is no need to take them into consideration.)…”
Section: Fdtd Performance On Multiple Fpgasmentioning
confidence: 99%