Document VersionPublisher's PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:• A submitted manuscript is the author's version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.• The final author version and the galley proof are versions of the publication after peer review.• The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publicationCitation for published version (APA): Gheorghita, S. V., Palkovic, M., Hamers, J., Vandecappelle, A., Mamagkakis, S., Basten, T., ... Bosschere, . System scenario based design of dynamic embedded systems. (ES reports; Vol. 2007-06). Eindhoven: Technische Universiteit Eindhoven. General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.• You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal ? Take down policyIf you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. In the past decade, real-time embedded systems have become much more complex due to the introduction of a lot of new functionality in one application, and due to running multiple applications concurrently. This increases the dynamic nature of today's applications and systems, and tightens the requirements for their constraints in terms of deadlines and energy consumption. State-of-theart design methodologies try to cope with these novel issues by identifying several most used cases and dealing with them separately, reducing the newly introduced complexity. This paper presents a generic and systematic design-time/run-time methodology for handling the dynamic nature of modern embedded systems, which can be utilized by existing design methodologies to increase their efficiency. It is based on the concept of system scenarios, which group system behaviors that are similar from a multi-dimensional cost perspective, such as resource requirements, delay, and energy consumption, in such a way that the system can be configured to exploit this cost similarity. At design-time, these scenarios are individually optimized. Mechanisms for predicting the current scenario at run-time and for switching between scenarios are ...
The ANTAREX 1 project aims at expressing the application selfadaptivity through a Domain Specific Language (DSL) and to runtime manage and autotune applications for green and heterogeneous High Performance Computing (HPC) systems up to Exascale. The DSL approach allows the definition of energy-efficiency, performance, and adaptivity strategies as well as their enforcement at runtime through application autotuning and resource and power management. We show through a mini-app extracted from one of the project application use cases some initial exploration of application precision tuning by means enabled by the DSL.
and A. VANDECAPPELLE, M. PALKOVIC, and F. CATTHOOR IMEC Modern embedded multimedia and telecommunications systems need to store and access huge amounts of data. This becomes a critical factor for the overall energy consumption, area, and performance of the systems. Loop transformations are essential to improve the data access locality and regularity in order to optimally design or utilize a memory hierarchy. However, due to abstract high-level cost functions, current loop transformation steering techniques do not take the memory platform sufficiently into account. They usually also result in only one final transformation solution. On the other hand, the loop transformation search space for real-life applications is huge, especially if the memory platform is still not fully fixed. Use of existing loop transformation techniques will therefore typically lead to suboptimal end-products. It is critical to find all interesting loop transformation instances. This can only be achieved by performing an evaluation of the effect of later design stages at the early loop transformation stage.This article presents a fast incremental hierarchical memory-size requirement estimation technique. It estimates the influence of any given sequence of loop transformation instances on the mapping of application data onto a hierarchical memory platform. As the exact memory platform instantiation is often not yet defined at this high-level design stage, a platform-independent estimation is introduced with a Pareto curve output for each loop transformation instance. Comparison among the Pareto curves helps the designer, or a steering tool, to find all interesting loop transformation instances that might later lead to low-power data mapping for any of the many possible memory hierarchy instances. Initially, the source code is used as input for estimation. However, performing the estimation repeatedly from the source code is too slow for large search space exploration. An incremental approach, based on local updating of the previous result, is therefore used to handle sequences of different loop transformations. Experiments show that the initial approach takes a few seconds, which is two orders of magnitude faster than state-of-the-art solutions but still too costly to be performed interactively many times. The incremental approach typically takes
Data transfers and storage are dominating contributors to the area and power consumption for all modern multimedia applications. A cost-efficient realisation of these systems can be obtained by using high-level memory optimisations. This paper demonstrates that the state-of-the-art memory optimisation techniques only partly can deal with code from real-life multimedia applications. We propose a systematic preprocessing methodology that can be applied on top of the existing work. This opens more opportunities for existing memory optimisation techniques. Our methodology is complemented with a postprocessing step, which eliminates the negative effects of preprocessing and may further improve the code quality. 13 18 Our methodology has been applied on several reallife multimedia applications. Results show a decrease in the number of main memory accesses up to 45.8% compared to applying only state-of-the-art techniques.
In the past decades, we have seen exponential in-We explain the whole flow using our mapping experiment crease of a single processor core performance. This was necessary where we have parallelized and mapped the 40 MHz MIMO to catch-up with the growing complexity of applications required SDM-OFDM application with focus on the parallelization and by the market. Till now, the performance growth was achieved by drastic increase of the processor core clock speed up to parallel mapping. The contributions of this paper are: current 4.3GHz. Nowadays, the further increase of performance . Introducing the whole design flow starting from the by increasing the clock frequency is not feasible due to power sequential specification and ending with the parallel imdissipation, leakage and scaling problems. However, the market plementation. still demands more and more complex applications. To deal with this demand, all modern processor units consist of multi-. Selecting the best parallelization option (based on comprocessor System-on-Chip (SoC) with multiple processing cores. munication overhead) from several discussed parallelizaThe question is, how do we map the modern applications on tion options. these modern processors.. Providing a way how to model the parallelization on In this paper, we demonstrate a flow starting from the setwo abstraction levels, the native LINUX platform and quential MATLAB specification going to parallel implementation t for a leading-edge 40 MHz Multiple Input Multiple Output thenpl (out of fou ationLevels we use for (MIMO) Space Division Multiplexing (SDM)-Orthogonal Fresequential code, i.e., the native LINUX platform, the quency Division Multiplexing (OFDM) application. We introduce Compiled Code Simulator (CCS), the ADRES Virtual the whole flow to the reader and we focus on the parallelization Machine (AVM), and the platform). part. We demonstrate both, the functional parallel model as . Proving by the simulation on the platform the benefits of well as the mapping of the application on the Software Defined Radio (SDR) platform [12] with two instances of state-of-thethe selected soluton. art ADRES embedded processor [7]. We show, that when we do The paper is structured as follows; In Section II we briefly the parallelization wisely w.r.t. communication overhead, we can describe the application we are going to parallelize and map on achieve the theoretical gain of factor of two for the SoC with two the platform and the platform itself. In Section III we explain processor instances. the whole design flow and we highlight the parallelization
The data transfers and storage are dominating contributors to the area and power consumption for all modern multimedia embedded systems. Modern high-level memory optimisations can ensure costefficient realisation of these systems. An important step in these optimisations are loop transformations performed on a geometrical model. However, these loop transformations traditionally cannot optimise eode across data dependent conditions. In this paper we selectively duplicate the code in order to enable global loop transformations across data dependent conditions. We propose a technique which finds in a systematic way the Pareto curve in 2D exploration space: the better memory optimisations vs. the code increase. Our technique has been tested on an MP3 audio decoder. Results show 45.8% decrease in the number of main memory accesses which requires a 16.2% increase of code size.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.