Compiling and Optimizing OpenMP 4.X Programs to OpenCL and SPIR

Pereira, Marcio Machado; Sousa, Rafael T. de; Araújo, Guido

doi:10.1007/978-3-319-65578-9_4

Cited by 10 publications

(7 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…fork/join. Pereira et al [16] describe a framework that automatically converts program sections annotated with OpenMP 4.x directives into OpenCL kernels. This design goes in the direction of our work, but we perform a step further, considering the severe constraints of the ULP MCUs, requiring specific optimizations.…”

Section: Related Workmentioning

confidence: 99%

Streamlining the OpenMP Programming Model on Ultra-Low-Power Multi-core MCUs

Montagna

Tagliavini

Rossi

et al. 2021

Architecture of Computing Systems

View full text Add to dashboard Cite

High-level programming models aim at exploiting hardware parallelism and reducing software development costs. However, their adoption on ultra-low-power multi-core microcontroller (MCU) platforms requires minimizing the overheads of work-sharing constructs on fine-grained parallel regions. This work tackles this challenge by proposing OMP-SPMD, a streamlined approach for parallel computing enabling the OpenMP syntax for the Single-Program Multiple-Data (SPMD) paradigm. To assess the performance improvement, we compare our solution with two alternatives: a baseline implementation of the OpenMP runtime based on the fork-join paradigm (OMP-base) and a version leveraging hardware-specific optimizations (OPM-opt). We benchmarked these libraries on a Parallel Ultra-Low Power (PULP) MCU, highlighting that hardware-specific optimizations improve OMP-base performance up to 69%. At the same time, OMP-SPMD leads to an extra improvement up to 178%.

show abstract

Section: Related Workmentioning

confidence: 99%

Streamlining the OpenMP Programming Model on Ultra-Low-Power Multi-core MCUs

Montagna

Tagliavini

Rossi

et al. 2021

Architecture of Computing Systems

View full text Add to dashboard Cite

show abstract

“…Most of the rest of the literature that strictly follows the OpenMP accelerator model studied GPU offloading and demonstrate good results [23,24]. Nevertheless, some of them look for more untraditional targets like the Intel Xeon Phi platform or FPGAs [65].…”

Section: Related Workmentioning

confidence: 99%

“…The map clause details the mapping of the data between the host and the target device: inputs (A and B) are mapped to the target, and the output (C) is mapped from the target. While typical target devices are DSP cores, GPUs, Xeon Phi accelerators, and so on [23,24], this article introduces the cloud as yet another target device available from the local computer, giving the programmer the ability to quickly expand the computational power of its own computer to a large-scale cloud cluster. Using OmpCloud, the programmer can leverage on his/her basic OpenMP knowledge.…”

Section: Introductionmentioning

confidence: 99%

Cluster Programming using the OpenMP Accelerator Model

Yviquel

Cruz

Araújo

2018

ACM Trans. Archit. Code Optim.

Self Cite

View full text Add to dashboard Cite

Computation offloading is a programming model in which program fragments (e.g., hot loops) are annotated so that their execution is performed in dedicated hardware or accelerator devices. Although offloading has been extensively used to move computation to GPUs, through directive-based annotation standards like OpenMP, offloading computation to very large computer clusters can become a complex and cumbersome task. It typically requires mixing programming models (e.g., OpenMP and MPI) and languages (e.g., C/C++ and Scala), dealing with various access control mechanisms from different cloud providers (e.g., AWS and Azure), and integrating all this into a single application. This article introduces computer cluster nodes as simple OpenMP offloading devices that can be used either from a local computer or from the cluster head-node. It proposes a methodology that transforms OpenMP directives to Spark runtime calls with fully integrated communication management, in a way that a cluster appears to the programmer as yet another accelerator device. Experiments using LLVM 3.8, OpenMP 4.5 on well known cloud infrastructures (Microsoft Azure and Amazon EC2) show the viability of the proposed approach, enable a thorough analysis of its performance, and make a comparison with an MPI implementation. The results show that although data transfers can impose overheads, cloud offloading from a local machine can still achieve promising speedups for larger granularity: up to 115× in 256 cores for the 2MM benchmark using 1GB sparse matrices. In addition, the parallel implementation of a complex and relevant scientific application reveals a 80× speedup on a 320 core machine when executed directly from the headnode of the cluster. CCS Concepts: • Computing methodologies → Distributed programming languages; • Software and its engineering → Parallel programming languages; Distributed programming languages; Runtime environments; Source code generation;

show abstract

“…While those works demonstrate good performance, Xeon Phi are quite an exotic platforms not really available to the majority of users. Most of the rest of the literature which strictly follows the OpenMP accelerator model studied GPU offloading and demonstrate good results for heavily parallel applications [8], [9]. Nevertheless, some of them look for more untraditional targets like the Intel Xeon Phi platform or FPGAs [24].…”

Section: Related Workmentioning

confidence: 99%

“…While the typical devices used in OpenMP 4.X are DSP cores, GPUs, Xeon Phi accelerators, etc. [8], [9], we introduced the cloud as a novel target device available on the computer. This was done within a programming framework we call OmpCloud [3] that extends the OpenMP accelerator model to allow transparent cloud offloading and cluster programming.…”

Section: Introductionmentioning

confidence: 99%

Automatic Ray-Tracer Cloud Offloading in OPENMP

Mortatti

Yviquel

Araújo

2018

2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

Self Cite

View full text Add to dashboard Cite

Rendering an image from a 3D scene requires a large amount of computation which grows exponentially with the complexity of the scene (e.g. number of objects and light sources). With the increasing demand of high definition content, 3D designers need to use high-performance computer systems to keep the rendering time acceptable. Since owning computer clusters is expensive, designers usually rent computing power directly from cloud service providers (e.g. AWS and Azure). However, even though many cloud providers already propose dedicated rendering services, integrating them within the standard workflow of modeling softwares can become a complex and cumbersome task. It typically requires exporting the project from the design software, dealing with various access control mechanisms from different clouds to upload the project, and executing the rendering remotely through command-line. Offloading computation to the cloud is a technique which can considerably simplify such tasks. To achieve that, this paper uses an extension of OpenMP 4.X to eliminate any major interactions with the end-user, while minimizing the complexity of cloud integration and optimizing the design workflow. It applies such approach to a ray-tracing application, a simplified version of the engines used by professional 3D modeling software (e.g. Blender). It automatically offloads the rendering process from the user computer to computer cluster within the Microsoft Azure cloud, brings the resulting images back after the computation ends and displays them directly on the screen of the user computer, thus providing a transparent programming model and good speed-ups over local execution.

show abstract

Compiling and Optimizing OpenMP 4.X Programs to OpenCL and SPIR

Cited by 10 publications

References 14 publications

Streamlining the OpenMP Programming Model on Ultra-Low-Power Multi-core MCUs

Streamlining the OpenMP Programming Model on Ultra-Low-Power Multi-core MCUs

Cluster Programming using the OpenMP Accelerator Model

Automatic Ray-Tracer Cloud Offloading in OPENMP

Contact Info

Product

Resources

About