Guillaume Iooss scite author profile

We present the first end-to-end modeling and compilation flow to parallelize hard real-time control applications while fully guaranteeing the respect of real-time requirements on off-the-shelf hardware. It scales to thousands of dataflow nodes and has been validated on two production avionics applications. Unlike classical optimizing compilation, it takes as input non-functional requirements (real time, resource limits). To enforce these requirements, the compiler follows a static resource allocation strategy, from coarse-grain tasks communicating over an interconnection network all the way to individual variables and memory accesses. It controls timing interferences resulting from mapping decisions in a precise, safe, and scalable way. CCS Concepts: • Computer systems organization → Multicore architectures; • Software and its engineering → Real-time systems software; Data flow languages; Compilers;

show abstract

On Program Equivalence with Reductions

Iooss

Alias

Rajopadhye

2014

View full text Add to dashboard Cite

Program equivalence is a well-known problem with a wide range of applications, such as algorithm recognition, program verification and program optimization. This problem is also known to be undecidable if the class of programs is rich enough, in which case semi-algorithms are commonly used. We focus on programs represented as Systems of Affine Recurrence Equations (SARE), defined over parametric polyhedral domains, a well known formalism for the polyhedral model. SAREs include as a proper subset, the class of affine control loop programs. Several program equivalence semi-algorithms were already proposed for this class. Some take into account algebraic properties such as associativity and commutativity. To the best of our knowledge, none of them manage reductions, i.e., accumulations of a parametric number of sub-expressions using an associative and commutative operator. Our main contribution is a new semialgorithm to manage reductions. In particular, we outline the ties between this problem and the perfect matching problem in a parametric bipartite graph.

show abstract

PALMED: Throughput Characterization for Superscalar Architectures

Derumigny

Bastian

Gruber

et al. 2022

View full text Add to dashboard Cite

In a super-scalar architecture, the scheduler dynamically assigns micro-operations (µOPs) to execution ports. The port mapping of an architecture describes how an instruction decomposes into µOPs and lists for each µOP the set of ports it can be mapped to. It is used by compilers and performance debugging tools to characterize the performance throughput of a sequence of instructions repeatedly executed as the core component of a loop.This paper introduces a dual equivalent representation: The resource mapping of an architecture is an abstract model where, to be executed, an instruction must use a set of abstract resources, themselves representing combinations of execution ports. For a given architecture, finding a port mapping is an important but difficult problem. Building a resource mapping is a more tractable problem and provides a simpler and equivalent model. This paper describes Palmed, a tool that automatically builds a resource mapping for pipelined, super-scalar, out-of-order CPU architectures. Palmed does not require hardware performance counters, and relies solely on runtime measurements.We evaluate the pertinence of our dual representation for throughput modeling by extracting a representative set of basic-blocks from the compiled binaries of the SPEC CPU 2017 benchmarks. We compared the throughput predicted by existing machine models to that produced by Palmed, and found comparable accuracy to state-of-the art tools, achieving sub-10 % mean square error rate on this workload on Intel's Skylake microarchitecture.

show abstract

Monoparametric Tiling of Polyhedral Programs

Iooss

Alias

Rajopadhye

2021

Int J Parallel Prog

View full text Add to dashboard Cite

Tiling is a crucial program transformation, adjusting the ops-to-bytes balance of codes to improve locality. Like parallelism, it can be applied at multiple levels. Allowing tile sizes to be symbolic parameters at compile time has many benefits, including efficient autotuning, and run-time adaptability to system variations. For polyhedral programs, parametric tiling in its full generality is known to be non-linear, breaking the mathematical closure properties of the polyhedral model. Most compilation tools therefore either perform fixed size tiling, or apply parametric tiling in only the final, code generation step.We introduce monoparametric tiling, a restricted parametric tiling transformation. We show that, despite being parametric, it retains the closure properties of the polyhedral model. We first prove that applying monoparametric partitioning (i) to a polyhedron yields a union of polyhedra with modulo conditions, and (ii) to an affine function produces a piecewise-affine function with modulo conditions. We then use these properties to show how to tile an entire polyhedral program. Our monoparametric tiling is general enough to handle tiles with arbitrary tile shapes that can tesselate the iteration space (e.g., hexagonal, trapezoidal, etc). This enables a wide range of polyhedral analyses and transformations to be applied.

show abstract

PALMED: Throughput Characterization for Superscalar Architectures -- Extended Version

Derumigny¹,

Gruber²,

Bastian³

et al. 2020

Preprint

View full text Add to dashboard Cite

IOOpt: automatic derivation of I/O complexity bounds for affine programs

Olivry

Iooss

Tollenaere

et al. 2021

View full text Add to dashboard Cite

This work was supported in part by the U.S. National Science Foundation through award 2018016, by MIAI Grenoble Alpes (ANR-19-P3IA-0003), and by the Bpifrance Programme d'Investissements d'Avenir (PIA) as part of the ES3CAP project.contraction and convolution kernels. Then we evaluate numerically the tightness of our bound using the convolution layers of Yolo9000 and representative tensor contractions from the TCCG benchmark suite. Finally, we show the pertinence of our I/O complexity model by reporting the running time of the recommended tiled code for the convolution layers of Yolo9000.

show abstract

Optimizing Dynamic Resource Allocation

Krakow

Rabiet

Zou

et al. 2014

Procedia Computer Science

View full text Add to dashboard Cite

Autotuning Convolutions Is Easier Than You Think

Tollenaere

Iooss

Pouget

et al. 2023

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

A wide range of scientific and machine learning applications depend on highly optimized implementations of tensor computations. Exploiting the full capacity of a given processor architecture remains a challenging task, due to the complexity of the microarchitectural features that come into play when seeking near-peak performance. Among the state-of-the-art techniques for loop transformations for performance optimization, AutoScheduler [34] tends to outperform other systems. It often yields higher performance as compared to vendor libraries, but takes a large number of runs to converge, while also involving a complex training environment. In this paper, we define a structured configuration space that enables much faster convergence to high-performance code versions, using only random sampling of candidates. We focus on two-dimensional convolutions on CPUs. Compared to state-of-the-art libraries, our structured search space enables higher performance for typical tensor shapes encountered in convolution stages in deep learning pipelines. Compared to auto-tuning code generators like AutoScheduler, it prunes the search space while increasing the density of efficient implementations. We analyze the impact on convergence speed and performance distribution, on two Intel x86 processors and one ARM AArch64 processor. We match or outperform the performance of the state-of-the-art oneDNN library and TVM’s AutoScheduler, while reducing the autotuning effort by at least an order of magnitude.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Guillaume Iooss

Correct-by-Construction Parallelization of Hard Real-Time Avionics Applications on Off-the-Shelf Predictable Hardware

On Program Equivalence with Reductions

PALMED: Throughput Characterization for Superscalar Architectures

Monoparametric Tiling of Polyhedral Programs

PALMED: Throughput Characterization for Superscalar Architectures -- Extended Version

IOOpt: automatic derivation of I/O complexity bounds for affine programs

Optimizing Dynamic Resource Allocation

Autotuning Convolutions Is Easier Than You Think

Contact Info

Product

Resources

About