Future applications for embedded systems demand chip multiprocessor designs to meet real-time deadlines. The large number of applications in these systems generates an exponential number of use-cases. The key design automation challenges are designing systems for these use-cases and fast exploration of software and hardware implementation alternatives with accurate performance evaluation of these use-cases. These challenges cannot be overcome by current design methodologies which are semiautomated, time consuming, and error prone.In this article, we present a design methodology to generate multiprocessor systems in a systematic and fully automated way for multiple use-cases. Techniques are presented to merge multiple use-cases into one hardware design to minimize cost and design time, making it well suited for fast design-space exploration (DSE) in MPSoC systems. Heuristics to partition use-cases are also presented such that each partition can fit in an FPGA, and all use-cases can be catered for.The proposed methodology is implemented into a tool for Xilinx FPGAs for evaluation. The tool is also made available online for the benefit of the research community and is used to carry out a DSE case study with multiple use-cases of real-life applications: H263 and JPEG decoders. The generation of the entire design takes about 100 ms, and the whole DSE was completed in 45 minutes, including FPGA mapping and synthesis. The heuristics used for use-case Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or permissions@acm.org. partitioning reduce the design-exploration time elevenfold in a case study with mobile-phone applications.
In neonatal intensive care units in hospitals, vital signs of neonates are monitored continuously using wired sensors. However, these wired sensors introduce skin irritations, pain, discomfort and sleep disruptions for the neonates. State of the art camera-based vital sign algorithms are becoming popular as a solution to these issues. However, there are limited investigations into the feasibility of monitoring the neonates in a clinical setting with these algorithms. Also, recent emergence of a wide variety of wearable head-mounted devices, like Google Glass, enable vital sign monitoring to be ubiquitous. Again, feasibility of the use of such a device for vital sign monitoring is unknown.This paper investigates both the feasibility of using a camera-based algorithm for pulse rate monitoring of neonates in a clinical setting and the feasibility of using Google Glass for such pulse rate monitoring. The results of our research show under what conditions the monitoring of the pulse rate of neonates would be reliable and highlights the challenging conditions. Also, they give insights into the applicability of a Google Glass prototype for pulse rate monitoring and it's current limitations.
We report a new time-resolved optical measurement method which combines single photon counting and the spread spectrum time-resolved optical measurement method. A laser diode modulated with pseudo-random bit sequences replaces the short pulse laser used in conventional time-resolved optical systems, while a single photon detector records the pulse sequence in response to the modulated excitation. Periodic cross-correlation is used to retrieve the impulse response. Feasibility of our approach is validated experimentally. A rise time around 150 picoseconds has been achieved with our prototype. Besides high temporal resolution, the new method also affords other benefits such as high photon counting rate, fast data acquisition, portability, and low cost.
Multiprocessor systems-on-chip (MPSoC) are being developed in increasing numbers to support the high number of applications running on modern embedded systems. Designing and programming such systems prove to be a major challenge. Most of the current design methodologies rely on creating the design by hand, and are therefore error-prone and time-consuming. This also limits the number of design points that can be explored. While some efforts have been made to automate the flow and raise the abstraction level, these are still limited to single-application designs.In this paper, we present a design methodology to generate and program MPSoC designs in a systematic and automated way for multiple applications. The architecture is automatically inferred from the application specifications, and customized for it. The flow is ideal for fast design space exploration (DSE) in MPSoC systems. We present results of a case study to compute the buffer-throughput trade-offs in real-life applications, H263 and JPEG decoders. The generation of the entire project takes about 100ms, and the whole DSE was completed in 45 minutes, including the FPGA mapping and synthesis.
Abstract-Hardware accelerators in heterogeneous multiprocessor system-on-chips are becoming popular as a means of meeting performance and energy efficiency requirements of modern embedded systems. Current design methods for accelerator synthesis, such as High-Level Synthesis, are not fully automated. Therefore, time consuming manual iterations are required to explore efficient accelerator alternatives: the programmer is still required to think in terms of the underlying architecture. In this paper, we present (AS) 2 : a design flow for Accelerator Synthesis using Algorithmic Skeletons. Skeletonization separates the structure of a parallel computation from an algorithms' functionality, enabling efficient implementations without requiring the programmer to have hardware knowledge. We define three such skeletons (for three image processing kernels) enabling FPGA specific parallelization techniques and optimizations. As a case study, we present a design space exploration of these skeletons and show how multiple design points with area-performance trade-offs for the accelerators can be efficiently and rapidly synthesized. We show that (AS) 2 is a promising direction for accelerator synthesis as it generates a pareto front of 8 design points in under half an hour for each of the three image processing kernels.
Abstract-HeterogeneousMultiprocessor System-on-Chips (HMPSoC) are becoming popular as a means of meeting energy efficiency requirements of modern embedded systems. However, as these HMPSoCs run multimedia applications as well, they also need to meet real-time requirements. Designing these predictable HMPSoCs is a key challenge, as the current design methods for these platforms are either semi-automated, non-predictable, or have limited heterogeneity.In this paper, we propose a design framework to generate and program HMPSoC designs in a rapid and predictable manner. It takes the application specifications and the architecture model as input and generates the entire HMPSoC, for FPGA prototyping, that meets the throughput constraints. The experimental results show that our framework can provide a conservative bound on the worst-case throughput of the FPGA implementation. We also present results of a case study that computes the area-power trade-offs of an industrial vision application. The entire design space exploration of all configurations was completed in 8 hours. A tool-chain targeting the Xilinx Zynq FPGA is also presented.
In modern embedded systems, heterogeneous architectures are crucial in achieving desired performance requirements under area and energy constraints. Many of these systems combine a multi-processor system-on-chip and a Field Programmable Gate Array to enable hardware acceleration. Although the introduction of High-Level Synthesis significantly reduced the complexity of utilizing these systems, a programmer is still required to have expert knowledge of both the High-Level Synthesis tool and the target hardware and to perform time consuming manual iterations to achieve efficient implementations. In this paper we present SPINE, a design flow for automatic generation of efficient hardware accelerators based on Algorithmic Species. SPINE allows the designer to focus on the algorithm by automatically applying hardware specific optimizations and parallelization techniques to the design. As a case study, we present a design space exploration of nine different loop-nests used in image processing kernels and show how SPINE rapidly generates multiple area-performance trade-offs. Furthermore, we compare our results the state of the art and show that SPINE is a promising direction for accelerator generation as the average performance and area improvement with SPINE are respectively 107% and 75% over the state of the art.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.