Luc Waeijen scite author profile

Efficient code generation for image processing applications continues to pose a challenge in a domain where high performance is often necessary to meet real-time constraints. The inherently complex structure found in most image-processing pipelines, the plethora of transformations that can be applied to optimize the performance of an implementation, as well as the interaction of these optimizations with locality, redundant computation and parallelism, can be indentified as the key reasons behind this issue. Recent domain-specific languages (DSL) such as the Halide DSL and compiler attempt to encourage high-level design-space exploration to facilitate the optimization process. We propose a novel optimization strategy that aims to maximize producer-consumer locality by exploiting reuse in image-processing pipelines. We implement our analysis as a tool that can be used alongside the Halide DSL to automatically generate schedules for pipelines implemented in Halide and test it on a variety of benchmarks. Experimental results on three different multi-core architectures show an average performance improvement of 40% over the Halide Auto-Scheduler and 75% over a state-of-the art approach that targets the PolyMage DSL.

show abstract

Blocks: Redesigning Coarse Grained Reconfigurable Architectures for Energy Efficiency

Wijtvliet

Huisken

Waeijen

et al. 2019

View full text Add to dashboard Cite

Reduction Operator for Wide-SIMDs Reconsidered

Waeijen

She

Corporaal

et al. 2014

View full text Add to dashboard Cite

It has been shown that wide Single Instruction Multiple Data architectures (wide-SIMDs) can achieve high energy efficiency, especially in domains such as image and vision processing. In these and various other application domains, reduction is a frequently encountered operation, where multiple input elements need to be combined into a single element by an associative operation, e.g. addition or multiplication. There are many applications that require reduction such as: partial histogram merging, matrix multiplication and min/max-finding. Wide-SIMDs contain a large number of processing elements (PEs) which in general are connected by a minimal form of interconnect for scalability reasons. To efficiently support reduction operations on wide-SIMDs with such a minimal interconnect, we introduce two novel reduction algorithms which do not rely on complex communication networks or any dedicated hardware. The proposed approaches are compared with both dedicated hardware and other software solutions in terms of performance, area, and energy consumption. A practical case study demonstrates that the proposed software approach has much better generality, flexibility and no additional hardware cost. Compared to a dedicated hardware adder tree, the proposed software approach saves 6.8% in area with a performance penalty of only 7.1%.

show abstract

OpenCL code generation for low energy wide SIMD architectures with explicit datapath

She

Waeijen

et al. 2013

View full text Add to dashboard Cite

Code Generation for Reconfigurable Explicit Datapath Architectures with LLVM

Adriaansen¹,

Wijtvliet²,

Jordans³

et al. 2016

View full text Add to dashboard Cite

SIMD made explicit

Waeijen

She

Corporaal

et al. 2013

View full text Add to dashboard Cite

How Flexible is Your Computing System?

Huang

Waeijen²,

Corporaal

2022

ACM Trans. Embed. Comput. Syst.

View full text Add to dashboard Cite

In literature, computer architectures are frequently claimed to be highly flexible , typically implying the existence of trade-offs between flexibility and performance or energy efficiency. Processor flexibility, however, is not very sharply defined, and consequently these claims can not be validated, nor can such hypothetical relations be fully understood and exploited in the design of computing systems. This paper is an attempt to introduce scientific rigour to the notion of flexibility in computing systems. A survey is conducted to provide an overview of references to flexibility in literature, both in the computer architecture domain, as well as related fields. A classification is introduced to categorize different views on flexibility, which ultimately form the foundation for a qualitative definition of flexibility. Departing from the qualitative definition of flexibility, a generic quantifiable metric is proposed, enabling valid quantitative comparison of the flexibility of various architectures. To validate the proposed method, and evaluate the relation between the proposed metric and the general notion of flexibility, the flexibility metric is measured for 25 computing systems, including CPUs, GPUs, DSPs, and FPGAs, and 40 ASIPs taken from literature. The obtained results provide insights into some of the speculative trade-offs between flexibility and properties such as energy efficiency and area efficiency. Overall the proposed quantitative flexibility metric shows to be commensurate with some generally accepted qualitative notions of flexibility collected in the survey, although some surprising discrepancies can also be observed. The proposed metric and the obtained results are placed into context of the state of the art on compute flexibility, and extensive reflection provides not only a complete overview of the field, but also discusses possible alternative approaches and open issues. Note that this work does not aim to provide a final answer to the definition of flexibility, but rather provides a framework to initiate a broader discussion in the computer architecture society on defining, understanding, and ultimately taking advantage of flexibility.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Luc Waeijen

Coarse grained reconfigurable architectures in the past 25 years: Overview and classification

Schedule Synthesis for Halide Pipelines through Reuse Analysis

Blocks: Redesigning Coarse Grained Reconfigurable Architectures for Energy Efficiency

Reduction Operator for Wide-SIMDs Reconsidered

OpenCL code generation for low energy wide SIMD architectures with explicit datapath

Code Generation for Reconfigurable Explicit Datapath Architectures with LLVM

SIMD made explicit

How Flexible is Your Computing System?

Contact Info

Product

Resources

About