CNN-on-AWS: Efficient Allocation of Multikernel Applications on Multi-FPGA Platforms

Shan, Junnan; Lazarescu, Mihai T.; Cortadella, Jordi; Lavagno, Luciano; Casu, Mario R.

doi:10.1109/tcad.2020.2994256

Cited by 14 publications

(8 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If the number of elements in S BIT is n and the number of newly generated rectangles is N NEW , then N NEW satisfies N NEW ≤ 2n − 1. A combination of the two edges that satisfy E i < E j produces a new rectangle whose side length can be expressed as Formulation (4).…”

Section: Build the Sequence Of Algorithmmentioning

confidence: 99%

“…Field programmable gate arrays (FPGAs) are gradually replacing X86 or GPU in high performance computing platforms in resource-constrained environments due to their low power consumption, high parallelism, and fast computing speed [1,2]. As applications become larger and more complex, system-on-chip (SoC) architectures consisting of multiple FPGAs (Multi-FPGA) that combine faster inter-chip interconnections to form larger, more computationally intensive units have become popular [3,4]. In addition, the Dynamic Partial Reconfiguration (DPR) technology of FPGA allows the runtime to dynamically configure tasks to different reconfigurable partitions [5], further increasing the flexibility of Multi-FPGA systems and virtually increasing the availability of hardware resources [6,7].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

GLRM: Geometric Layout-Based Resource Management Method on Multiple Field Programmable Gate Array Systems

Gao,

Li,

Zhou

et al. 2024

Electronics

View full text Add to dashboard Cite

Multiple field programmable gate array (Multi-FPGA) systems are capable of forming larger and more powerful computing units through high-speed interconnections between chips and are beginning to be widely used by various computing service providers. However, the new computing architecture brings new challenges to the system’s task resource management. Existing resource management methods do not fully exploit resources in Multi-FPGA systems, and it is difficult to support fast resource request and release. In this regard, we propose a geometric layout-based resource management (GLRM) method for Multi-FPGA systems. First, a geometric layout-based task combination algorithm (TCA) was proposed to ensure that the final system can use the available FPGA resources more efficiently. Then, we optimised two resource management algorithms using TCA. Compared with state-of-the-art resource management methods, TCA increases resource flexibility by an average of 6% and resource utilisation by an average of 7%, and the two optimised resource management methods are effective in improving resource management performance.

show abstract

Section: Build the Sequence Of Algorithmmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

GLRM: Geometric Layout-Based Resource Management Method on Multiple Field Programmable Gate Array Systems

Gao,

Li,

Zhou

et al. 2024

Electronics

View full text Add to dashboard Cite

show abstract

“…Shan et al introduce [172] a CNN multi-kernel application and its implementation on AWS-F1, where an analytical model is used to compute data transfers (CPU to DDR, DDR to FPGA, FPGA to DDR, and DDR to CPU) and kernel computation times.…”

Section: A Modelsmentioning

confidence: 99%

High-Level Synthesis Hardware Design for FPGA-Based Accelerators: Models, Methodologies, and Frameworks

et al. 2022

View full text Add to dashboard Cite

Hardware accelerators based on field programmable gate array (FPGA) and system on chip (SoC) devices have gained attention in recent years. One of the main reasons is that these devices contain reconfigurable logic, which makes them feasible for boosting the performance of applications. High-level synthesis (HLS) tools facilitate the creation of FPGA code from a high level of abstraction using different directives to obtain an optimized hardware design based on performance metrics. However, the complexity of the design space depends on different factors such as the number of directives used in the source code, the available resources in the device, and the clock frequency. Design space exploration (DSE) techniques comprise the evaluation of multiple implementations with different combinations of directives to obtain a design with a good compromise between different metrics. This paper presents a survey of models, methodologies, and frameworks proposed for metric estimation, FPGA-based DSE, and power consumption estimation on FPGA/SoC. The main features, limitations, and trade-offs of these approaches are described. We also present the integration of existing models and frameworks in diverse research areas and identify the different challenges to be addressed. INDEX TERMSComputing models, design space exploration, field programmable gate array (FPGA), system on chip (SoC), power consumption.years (2016-2022) and have been selected based on the topics addressed in this survey. Several papers published before 2016 have been considered because of their contributions to the current literature. C. OUTLINEThe remainder of this paper is organized as follows. Section II briefly presents the most widely used parallel computing models for CPU, GPU, and multicore processors. Section III introduces the FPGA-based reconfigurable hardware accelerator architectures, hardware/software co-design, DSE and metrics, and the techniques to improve latency, area, and power for this technology. In Section IV, we describe previous works on models, methodologies, and frameworks proposed for FPGA/SoC according to their main features: metrics estimation (IV-A), FPGA-based DSE (IV-B), and power consumption estimation (IV-C); and in Section IV-D, we present a summary and discussion. The integration of models and frameworks for FPGA-based reconfigurable hardware accelerators in different research fields is exposed in Section V. Challenges are analyzed in Section VI. Finally, conclusions are presented in Section VII. II. PARALLEL COMPUTING MODELS FOR PERFORMANCE ESTIMATIONComputing models allow to easily analyzing algorithms by simplifying the computational world to a reduced set of parameters that define the cost of arithmetic and memory access operations and communication. These models contribute to the search for efficient algorithms for a given architecture, improving the productivity of designers, programmers, and engineers. A small amount of communication, a small number of operations, and a high degree of parallelism are key points that d...

show abstract

“…While the computation time remains constant for up to A max antennas, the time required to transfer all the coefficients from the host to the DDR memories via the PCIe bus grows proportionally to the number of FPGAs because of the inevitable data duplication [30]. (As the initial values of E and H fields are zero, there is no need to take them into consideration.)…”

Section: Fdtd Performance On Multiple Fpgasmentioning

confidence: 99%

FPGA Acceleration of 3D FDTD for Multi- Antennas Microwave Imaging Using HLS

Mansoori¹,

Casu

2021

IEEE Access

Self Cite

View full text Add to dashboard Cite

Microwave Imaging (MI) for biomedical applications has attracted attention due to its harmless radiation compared to X-ray or MRI. One of the commonly used computing methods in MI is Finite Difference Time Domain (FDTD), which is executed several times in iterative loops, hence resulting in a high execution time. Although several hardware accelerators for FDTD have been recently introduced, they are not specifically designed for MI applications. In particular, only simple absorbing boundary conditions have been investigated, and the impact of dispersive materials on FDTD has not been considered. In this paper, we propose a multi-FPGA accelerator for 3D FDTD that is integrated in an MI algorithm, with Convolutional Perfectly Matched Layer (CPML) boundary conditions and an exact model for dispersive materials. By using High Level Synthesis (HLS), we obtain an optimized hardware accelerator that uses an efficient blocking method to reduce the data transfer time between external and local memories. We propose two alternative architectures that trade off performance and resource usage. In addition, our code, being developed at a high level, can also be run on GPUs whenever necessary. The results show that our multi-FPGA accelerator is superior to three similar GPU-based designs in terms of execution time and power consumption.

show abstract

CNN-on-AWS: Efficient Allocation of Multikernel Applications on Multi-FPGA Platforms

Cited by 14 publications

References 17 publications

GLRM: Geometric Layout-Based Resource Management Method on Multiple Field Programmable Gate Array Systems

GLRM: Geometric Layout-Based Resource Management Method on Multiple Field Programmable Gate Array Systems

High-Level Synthesis Hardware Design for FPGA-Based Accelerators: Models, Methodologies, and Frameworks

FPGA Acceleration of 3D FDTD for Multi- Antennas Microwave Imaging Using HLS

Contact Info

Product

Resources

About