Abstract-Several optimizations must be considered for the design of streaming applications (e.g. multimedia or network packet processing). These applications can be modelled as a set of processes that communicate using buffers. Cyclo-Static Dataflow graphs, which are an extension of Synchronous Dataflow graphs, allow to consider a large class of industrial applications. This paper presents an original methodology to minimize the global surface of the buffers for a Cyclo-Static Dataflow graph under a given throughput constraint. It is proved that, if the processes are periodic, each buffer introduces a linear constraint described analytically. The optimization problem is then modelled by an Integer Linear Program. A polynomial algorithm based on its relaxation provides a quasi-optimal solution for real life problems. The resolution of the optimization problem for a ReedSolomon Decoder application is then detailed.
Abstract-The design of streaming (e.g. multimedia or network packet processing) applications must consider several optimizations such as the mimimization of the whole surface of the memory needed on a Chip. The minimum throughput of the output is usually fixed. In this paper, we present an original methodology to solve this problem.The application is modelled using a Marked Timed Weighted Event Graphs (in short MTWEG), which is a subclass of Petri nets. Transitions correspond to specific treatments and the places model buffers for data transfers. It is assumed that transitions are periodically fired with a fixed throughput.The problem is first mathematically modelled using an Integer Linear Program. We then study for a unique buffer the optimum throughput according to the capacity. A first polynomial simple algorithm computing the minimum surface for a fixed throughput is derived when there is no circuit in the initial MTWEG, which corresponds to a wide class of applications. We prove in this case that the capacities of every buffer may be optimized independently.For general MTWEG, the problem is NP-Hard and an original polynomial 2-approximation algorithm is presented. For practical applications, the solution computed is very close to the optimum.
International audienceCyclo-Static DataFlow Graphs (CSDFG in short) is a formalism commonly used to model parallel applications composed by actors communicating through buffers. The liveness of a CSDFG ensures that all actors can be executed infinitely often. This property is clearly fundamental for the design of embedded applications. This paper aims to present first an original sufficient condition of liveness for a CSDFG. Two algorithms of polynomial-time for checking the liveness are then derived and compared to a symbolic execution of the graph. An original method to compute close-to-optimal buffer capacities ensuring liveness is also presented and experimentaly tested. The performance of our methods are comparable to those existing in the literature for industrial applications. However, they are far more effective on randomly generated instances, ensuring their scalability for future more complex applications and their possible implementation in a compiler
This paper demonstrates the combined use of three simulation tools in support of a co-design methodology for an HPC-focused System-on-a-Chip (SoC) design. The simulation tools make different trade-offs between simulation speed, accuracy and model abstraction level, and are shown to be complementary. We apply the MUSA trace-based simulator for the initial sizing of vector register length, system-level cache (SLC) size and memory bandwidth. It has proven to be very efficient at pruning the design space, as its models enable sufficient accuracy without having to resort to highly detailed simulations. Then we apply gem5, a cycle-accurate micro-architecture simulator, for a more refined analysis of the performance potential of our reference SoC architecture, with models able to capture detailed hardware behavior at the cost of simulation speed. Furthermore, we study the network-on-chip (NoC) topology and IP placements using both gem5 for representative small-to medium-scale configurations and SESAM/VPSim, a transaction-level emulator for larger scale systems with good simulation speed and sufficient architectural details. Overall, we consider several system design concerns, such as processor subsystem sizing and NoC settings. We apply the selected simulation tools, focusing on different levels of abstraction, to study several configurations with various design concerns and evaluate them to guide architectural design and optimization decisions. Performance analysis is carried out with a number of representative benchmarks. The obtained numerical results provide guidance and hints to designers regarding SIMD instruction width, SLC sizing, memory bandwidth as well as the best placement of memory controllers and NoC form factor. Thus, we provide critical insights for efficient design of future HPC microprocessors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.