In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows. ACM Reference format: Stylianos I. Venieris, Alexandros Kouris and Christos-Savvas Bouganis. 2018. Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions. ACM Comput. Surv. 0, 0, Article 0 (March 2018), 36 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
INTRODUCTIONConvolutional Neural Networks (CNNs) [47] have demonstrated remarkable performance in Artificial Intelligence (AI) tasks. Being able to achieve high accuracy and frequently outperform traditional AI approaches, CNNs have been employed in a vast range of applications over the last decade, from object detection [72][53] and classification [78][82] to drone navigation [20] and autonomous driving [11] [7]. While becoming the state-of-the-art algorithm in AI fields such as machine vision, CNNs are challenged to deal with tasks of continuously increasing complexity. This leads to the design of deeper, more expressive networks at the expense of an increase in computational and memory requirements.