Performance Evaluation of State-of-the-Art Edge Computing Devices for DNN Inference

Rancano, Xalo; Molanes, Roberto Fernandez; Gonzalez-Val, Carlos; Rodríguez-Andina, Juan J.; Fariña, José

doi:10.1109/iecon43393.2020.9255055

Cited by 4 publications

(6 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The deep NN is trained off line; then, through pruning and quantization methods, the groups of artificial neurons that rarely or never fire are removed and the numerical precision of the weights is reduced, so that a reduced model size and a faster computation is achieved at the cost of minimal reduction in prediction accuracy [39]. Based on the parallel characteristics inherent to such algorithms, an FPGA-based or GPUbased implementation is thus highly recommended [40].…”

Section: Next Generation Of Smart Controllers For Electrical Energy Systemsmentioning

confidence: 99%

System-on-Chip FPGA Devices for Complex Electrical Energy Systems Control

Hilairet¹,

Spagnuolo

Cirstea

2022

EEE Ind. Electron. Mag.

View full text Add to dashboard Cite

Section: Next Generation Of Smart Controllers For Electrical Energy Systemsmentioning

confidence: 99%

System-on-Chip FPGA Devices for Complex Electrical Energy Systems Control

Hilairet¹,

Spagnuolo

Cirstea

2022

EEE Ind. Electron. Mag.

View full text Add to dashboard Cite

“…The inference time of a DPU on a Zedboard against a CPU, two GPUs and two TPU solutions is compared in [8]. Four common CNNs are tested: Mobilenet v1, Mobilenet v2, Inception v1 and Inception v3.…”

Section: Related Workmentioning

confidence: 99%

“…In the current version of the compiler, all operations after the first non supported operation are also mapped on the CPU [45]. This CPU is slower than the TPU itself, which means that the architecture of the model can have a big impact on the performance of the model on the TPU [8].…”

Section: Tool Flows For Tpumentioning

confidence: 99%

“…Dedicated platforms for this task already exist [8]. Google, for example, offers an extension to their popular machine learning framework, TensorFlow, for fast and lightweight CNN inference on high-level embedded platforms such as a Raspberry Pi (RPi).…”

Section: Introductionmentioning

confidence: 99%

“…Coral DevBoard: The Coral DevBoard is a single-board computer with a TPU SoM on the same module. The overhead caused by the USB interface that is present with the USB Coral disappears, producing a positive effect on the inference time, compared to a RPi + USB Coral combination [8]. The DevBoard has a quadcore NXP i.MX 8M SoC (ARM Cortex A53 and Cortex-M4F cores) and 1 GB of RAM [38].…”

mentioning

confidence: 99%

See 2 more Smart Citations

Environmental Sound Recognition on Embedded Systems: From FPGAs to TPUs

et al. 2021

View full text Add to dashboard Cite

In recent years, Environmental Sound Recognition (ESR) has become a relevant capability for urban monitoring applications. The techniques for automated sound recognition often rely on machine learning approaches, which have increased in complexity in order to achieve higher accuracy. Nonetheless, such machine learning techniques often have to be deployed on resource and power-constrained embedded devices, which has become a challenge with the adoption of deep learning approaches based on Convolutional Neural Networks (CNNs). Field-Programmable Gate Arrays (FPGAs) are power efficient and highly suitable for computationally intensive algorithms like CNNs. By fully exploiting their parallel nature, they have the potential to accelerate the inference time as compared to other embedded devices. Similarly, dedicated architectures to accelerate Artificial Intelligence (AI) such as Tensor Processing Units (TPUs) promise to deliver high accuracy while achieving high performance. In this work, we evaluate existing tool flows to deploy CNN models on FPGAs as well as on TPU platforms. We propose and adjust several CNN-based sound classifiers to be embedded on such hardware accelerators. The results demonstrate the maturity of the existing tools and how FPGAs can be exploited to outperform TPUs.

show abstract