Energy efficient embedded computing enables new application scenarios in mobile devices like software-defined radio and video processing. The hierarchical multiprocessor considered in this work may contain dozens or hundreds of resource efficient VLIW CPUs. Programming this number of CPU cores is a complex task requiring compiler support. The stream programming paradigm provides beneficial properties that help to support automatic partitioning. This work describes a compiler for streaming applications targeting the self-build hierarchical CoreVA-MPSoC multiprocessor platform. The compiler is supported by a programming model that is tailored to fit the streaming programming paradigm. We present a novel simulatedannealing (SA) based partitioning algorithm, called Smart SA. The overall speedup of Smart SA is 12.84 for an MPSoC with 16 CPU cores compared to a single CPU implementation. Comparison with a state of the art partitioning algorithm shows an average performance improvement of 34.07%.
I . I N T R O D U C T I O NThe decreasing feature size of microelectronic circuits allows for the integration of more and more processing cores on a single chip. A Multiprocessor System-on-Chip (MPSoC) may consist of dozens of processing elements as CPU cores or specialized hardware accelerators connected by a highspeed communication infrastructure, i.e. a Network-On-Chip (NoC). However, mapping general purpose applications to a large number of MPSoC processing elements remains a nontrivial task. Manually writing low-level code for each core makes it difficult to experiment with different decompositions and mappings of computation to processors. Alternatively, higher-level programming frameworks allow the compiler to evaluate a larger design-space when mapping the application to different hardware configurations. Efficient mapping algorithms are important for finding optimized solutions. The Streaming paradigm provides regular and repeating computation and independent filters with explicit communication. This allows compilers to exploit the task more easily, data and pipeline parallelism commonly found in signal processing, multimedia, network processing, cryptology and similar application domains.A popular stream based programming language is StreamIt [1], [2]. The key principle of this language is to provide information about inherent parallelism of the program by using a structured data flow graph. This graph consisting of filters, pipelines, split-joins, and feedback loops.In this paper we present a compiler for the StreamIt Language targeting the self-build CoreVA-MPSoC architecture. The CoreVA-MPSoC is a highly scalable multiprocessor system based on a hierarchical communication infrastructure and the configurable VLIW 1 processor CoreVA.This paper is organized as follows: Section II describes our CoreVA-MPSoC hardware architecture. In Section III we discuss our StreamIt compiler with a focus on our novel simulated annealing partitioning algorithm (Smart SA). The communication model proposed in this work is presented in S...
Parallel programming and effective partitioning of applications for embedded many-core architectures requires optimization algorithms. However, these algorithms have to quickly evaluate thousands of different partitions. We present a fast performance estimator embedded in a parallelizing compiler for streaming applications. The estimator combines a single execution-based simulation and an analytic approach. Experimental results demonstrate that the estimator has a mean error of 2.6% and computes its estimation 2848 times faster compared to a cycle accurate simulator.
We propose and discuss a platform overarching benchmark suite for neuromorphic hardware. This suite covers benchmarks from low-level characterization to high-level application evaluation using benchmark specific metrics. With this rather broad approach we are able to compare various hardware systems including mixed-signal and fully digital neuromorphic architectures. Selected benchmarks are discussed and results for several target platforms are presented revealing characteristic differences between the various systems. Furthermore, a proposed energy model allows to combine benchmark performance metrics with energy efficiency. This model enables the prediction of the energy expenditure of a network on a target system without actually having access to it. To quantify the efficiency gap between neuromorphics and the biological paragon of the human brain, the energy model is used to estimate the energy required for a full brain simulation. This reveals that current neuromorphic systems are at least four orders of magnitude less efficient. It is argued, that even with a modern fabrication process, two to three orders of magnitude are remaining. Finally, for selected benchmarks the performance and efficiency of the neuromorphic solution is compared to standard approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.