Abstract-Convolutional neural networks (CNNs) demonstrate excellent performance in various computer vision applications. In recent years, FPGA-based CNN accelerators have been proposed for optimizing performance and power efficiency. Most accelerators are designed for object detection and recognition algorithms that are performed on low-resolution (LR) images. However, real-time image super-resolution (SR) cannot be implemented on a typical accelerator because of the long execution cycles required to generate high-resolution (HR) images, such as those used in ultra-high-definition (UHD) systems. In this paper, we propose a novel CNN accelerator with efficient parallelization methods for SR applications. First, we propose a new methodology for optimizing the deconvolutional neural networks (DCNNs) used for increasing feature maps. Secondly, we propose a novel method to optimize CNN dataflow so that the SR algorithm can be driven at low power in display applications. Finally, we quantize and compress a DCNN-based SR algorithm into an optimal model for efficient inference using on-chip memory. We present an energyefficient architecture for SR and validate our architecture on a mobile panel with quad-high-definition (QHD) resolution. Our experimental results show that, with the same hardware resources, the proposed DCNN accelerator achieves a throughput up to 108 times greater than that of a conventional DCNN accelerator. In addition, our SR system achieves an energy efficiency of 144.9 GOPS/W, 293.0 GOPS/W, and 500.2 GOPS/W at SR scale factors of 2, 3, and 4, respectively. Furthermore, we demonstrate that our system can restore HR images to a high quality while greatly reducing the data bit-width and the number of parameters compared to conventional SR algorithms.
The standard C/C++ implementation of a spatial partitioning data structure, such as octree and quadtree, is often inefficient in terms of storage requirements particularly when the memory overhead for maintaining parentto-child pointers is significant with respect to the amount of actual data in each tree node. In this work, we present a novel data structure that implements uniform spatial partitioning without storing explicit parent-tochild pointer links. Our linkless tree encodes the storage locations of subdivided nodes using perfect hashing while retaining important properties of uniform spatial partitioning trees, such as coarse-to-fine hierarchical representation, efficient storage usage, and efficient random accessibility. We demonstrate the performance of our linkless trees using image compression and path planning examples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.