Convolutional Neural Networks (CNNs) are at the base of many applications, both in embedded and in serverclass contexts. While Graphics Processing Units (GPUs) are predominantly used for training, solutions for inference often rely on Field Programmable Gate Arrays (FPGAs) since they are more flexible and cost-efficient in many scenarios. However, existing approaches fall short to accomplish several conflicting goals, like efficiently using resources on multiple platforms while retaining deep configurability and allowing a quick Design Space Exploration (DSE) towards the best solution. This paper proposes a solution composed of highly configurable kernels designed for resources time-sharing with an analytical model of their resource/performance characteristics. Building on such models, we propose an Integer Linear Programming (ILP)-based approach to effectively identify pareto optimal kernel configurations in terms of throughput and resource consumption. We evaluate our DSE on two state-of-the-art CNNs, showing how it identifies hundreds of pareto optimal solutions in less than a minute. Guided from the DSE configurations of the AlexNet network, we quickly identified a candidate design for a Xilinx Virtex-7 XC7VX485T FPGA and achieved a peak throughput of 4.05 ms per image, while we measured a maximum estimation error of 6.69% with respect to the proposed analytical models.
Face Detection (FD) recently became the base of multiple applications requiring low latency but also with limited resources and energy budgets. Deep Convolutional Neural Networks (DCNNs) are especially accurate in FD, but latency requirements and energy budgets call for Field Programmable Gate Arrays (FPGAs)-based solutions, trading flexibility and efficiency. Nonetheless, the offer of FPGAs solutions is limited and different chips often require expensive redesign phases, while developers desire solutions whose resources can scale proportionally to the demands. Therefore, this work presents an FD solution based on a DCNN on a distributed, embedded system with FPGAs, proposing a general approach to reduce the DCNN size and to design its FPGA cores and investigating its accuracy, performance, and energy efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.