Abstract. In this paper, we present a fast ICAP controller providing high-speed configuration and easy-to-use readback capabilities, reducing configuration overhead as much as possible. In order to enhance performance, FaRM uses techniques such as DMA, ICAP overclocking, bitstream pre-load into controller and bitstream compression, using an evolution of the Run Length Encoding algorithm. We also propose a reconfiguration overhead estimation model which gives a good idea of the overhead. This approach is tested with an AES encryption/decryption architecture. With proper ICAP overclocking to 200 MHz, we are able to reach the ICAP upper bound throughput of 800 MB/s.
Abstract-Many industrial domains rely on vision-based applications which require to comply with severe performance and embedded requirements. TULIPP will develop a reference platform, which consists of a hardware system, a tool chain and a real-time operating system. This platform defines implementation rules and interfaces to tackle power consumption issues while delivering high, energy efficient and guaranteed computing performance for image processing applications. Using this reference platform will enable designers to develop a complete solution at a reduced cost to meet the typical embedded systems requirements: Size, Weight and Power. Moreover, for less constrained systems which performance requirements cannot be fulfilled by one instance of the platform, the reference platform will also be scalable so that the resulting boards can be chained for higher processing power. The instance of the reference platform developed during the project will be use-case driven and split between the implementation of: a reference hardware architecture -a scalable low-power board; a low-power operating system and image processing libraries; a productivityenhancing tool chain. It will lead to three proof-of-concept demonstrators across different application domains: real-time and low-power medical image processing product prototype of surgical X-ray system (mobile c-arm); embedded image processing systems within Unmanned Aerial Vehicles (UAVs); automotive real time embedded systems for driver assistance. TULIPP will set up an ecosystem and will closely work with standardization organizations to propose new standards derived from its reference platform to the industry.
International audienceIn this paper, we present a flow enabling design space exploration for partially reconfigurable systems with real-time constraints, called FoRTReSS. FoRTReSS allows estimating mixed hardware/software implementations of an application where the hardware design space, the floorplanning of reconfigurable regions placed on the FPGA, is automatically inferred from application resources information, interface constraints and the target device. Real-time constraints are verified by a highly configurable SystemC simulator, RecoSim, handling applications described as control data flow graphs (CDFGs). We demonstrate our approach on an H.264 video decoder and an H.265 encoder targeting the latest Zynq-7000 platforms from Xilinx, embedding a Cortex-A9 dual-core processor. We show that an hardware/software implementation of the H.264 decoder using both processor cores and slice decomposition is possible under real-time constraints, effectively achieving a framerate of 30 frames per second while reducing area requirements compared to a static implementation, using 54 % less slice resources and 44 % less BRAM resources. Additionally we report the ability of the methodology to address very early analysis from high level application specification on the example of an H.265 encoder
This paper describes a methodology to improve the energy efficiency of high-performance multiprocessor architectures with dynamic and partial reconfiguration (DPR), based on a thorough application study in the field of smart camera technology. Field-programmable gate arrays are increasingly being used in cameras owing to their suitability for real-time image processing with intensive, high-performance tasks and to the recent advances in dynamic reconfiguration that further improve energy efficiency. The approach used to best exploit DPR is based on the better coupling of 2 decisive elements in the problem of heterogeneous deployment: design space exploration and advanced scheduling. We show how a tight integration of exploration, energy-aware scheduling, common power models, and decision support in heterogeneous DPR multiprocessor system-on-a-chip mapping can be used to improve the energy efficiency of hardware acceleration. Applying this to a mobile vehicle license-plate tracking and recognition service results in up to a 19-fold improvement in energy efficiency compared with software multiprocessor execution (in terms of energy-delay product) and up to more than a threefold improvement compared with a multiprocessor with static hardware acceleration (ie, without DPR). KEYWORDSdynamic partial reconfiguration, energy efficiency, multicore, manycore, smart camera 1648
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.