From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

Błażewicz, Marek; Hinder, Ian; Koppelman, David M.; Brandt, Steven R.; Ciznicki, Milosz; Kierzynka, Michał; Löffler, Frank; Schnetter, Erik; Jian, Tao

doi:10.1155/2013/167841

Cited by 13 publications

(14 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The framework focuses only on the GPU architecture. Similarly, work in [1] utilises a simple decomposition method with uniform partition where each processor and accelerator receives blocks of the same size. On the other hand, authors in [20] provide a method that allows programmers to partition the data contiguously between CPU and GPU within a single node.…”

Section: Related Workmentioning

confidence: 99%

“…for computational fluid dynamics, geometric modelling, solving partial differential equations or image and video processing [1][2][3][4][5]. As computing time and memory usage grow linearly with the number of array elements in stencil computations our research targets highly parallel implementations of stencil codes together with task scheduling and optimization techniques taking into consideration energy cost and data locality [6][7][8][9][10].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Energy aware scheduling model and online heuristics for stencil codes on heterogeneous computing architectures

2016

Self Cite

View full text Add to dashboard Cite

Performance of high-end supercomputers will reach the exascale through the advent of core counts in billions. However, in the upcoming exascale computing era it is important not only to focus on the performance, but also on scalability of fine-grained parallel applications, data locality and energy aware scheduling within the parallel code. In fact, parallel applications need to change even now by redesigning algorithms and data structures respectively to take advantage of the recent improvements in energy efficiency of heterogeneous computing hardware, including multicore processors and GPU accelerators. Over the next few years one of the biggest challenges for exascale will be the ability of parallel applications to fully exploit locality which will, in turn, be required to achieve expected performance and energy efficiency. Future highly parallel applications will have to deal with deep memory hierarchies taking into account energy cost in moving data off-chip. Therefore, they will have to apply new coordinated scheduling approaches to balance energy aware resource utilization and minimize work starvation during runtime. As new constraints and limits on memory bandwidth and energy will play a key role in high performance computing (HPC) in the future, more sophisticated and dynamic scheduling techniques will be needed and applied within the parallel code. In this paper we focus on an energy-aware distribution of the stencil workload on heterogeneous processors. Our analysis of energy and performance models focused on relevant class of stencil computations to explore the relationship between task scheduling algorithms and energy constraints. More precisely, we search for a schedule which minimizes the energy usage within a specified computation's deadline of the stencil workload on heterogeneous architectures. Since the problem is computationally intractable, we present an integer linear programming formulation for finding optimal schedules. As finding optimal schedules is time consuming we have developed four heuristics and tested them experimentally with respect to optimal solutions. In our work we focus on a single node configurations with heterogeneous processors. These configurations represent the state of the art multi-and many-core architectures.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Energy aware scheduling model and online heuristics for stencil codes on heterogeneous computing architectures

2016

Self Cite

View full text Add to dashboard Cite

show abstract

“…Some of the implementations, especially those taking advantage of modern GPUs, have become specific gems in the world of high performance computing [13]. The choice of computational architecture of this kind was not incidental though, as its great potential has already been demonstrated in many other works related to scientific simulations [14,15], databases [16,17] or optimization problems [18]. Historically, the first implementation of the SmithWatermann algorithm using CUDA-capable GPUs was developed by Manavski S. et al [19].…”

Section: Related Workmentioning

confidence: 99%

G-DNA – a highly efficient multi-GPU/MPI tool for aligning nucleotide reads

Frohmberg¹,

Kierzynka²,

Błażewicz³

et al. 2013

Bulletin of the Polish Academy of Sciences: Technical Sciences

View full text Add to dashboard Cite

Abstract. DNA/RNA sequencing has recently become a primary way researchers generate biological data for further analysis. Assembling algorithms are an integral part of this process. However, some of them require pairwise alignment to be applied to a great deal of reads. Although several efficient alignment tools have been released over the past few years, including those taking advantage of GPUs (Graphics Processing Units), none of them directly targets high-throughput sequencing data. As a result, a need arose to create software that could handle such data as effectively as possible. G-DNA (GPU-based DNA aligner) is the first highly parallel solution that has been optimized to process nucleotide reads (DNA/RNA) from modern sequencing machines. Results show that the software reaches up to 89 GCUPS (Giga Cell Updates Per Second) on a single GPU and as a result it is the fastest tool in its class. Moreover, it scales up well on multiple GPUs systems, including MPI-based computational clusters, where its performance is counted in TCUPS (Tera CUPS).

show abstract

“…They have been successfully used as accelerators for example in gas and oil industry [7,8], medical imaging [9][10][11], bioinformatics [12][13][14], metaheuristics [15], or stencil-based computations [16,17]. Nevertheless, the primary application of GPUs is still the image and video processing [18][19][20][21].…”

Section: Introductionmentioning

confidence: 99%

Real-time motion tracking using optical flow on multiple GPUs

Mahmoudi¹,

Kierzynka²,

Manneback³

et al. 2014

Bulletin of the Polish Academy of Sciences: Technical Sciences

View full text Add to dashboard Cite

Abstract. Motion tracking algorithms are widely used in computer vision related research. However, the new video standards, especially those in high resolutions, cause that current implementations, even running on modern hardware, no longer meet the needs of real-time processing. To overcome this challenge several GPU (Graphics Processing Unit) computing approaches have recently been proposed. Although they present a great potential of a GPU platform, hardly any is able to process high definition video sequences efficiently. Thus, a need arose to develop a tool being able to address the outlined problem.In this paper we present software that implements optical flow motion tracking using the Lucas-Kanade algorithm. It is also integrated with the Harris corner detector and therefore the algorithm may perform sparse tracking, i.e. tracking of the meaningful pixels only. This allows to substantially lower the computational burden of the method. Moreover, both parts of the algorithm, i.e. corner selection and tracking, are implemented on GPU and, as a result, the software is immensely fast, allowing for real-time motion tracking on videos in Full HD or even 4K format. In order to deliver the highest performance, it also supports multiple GPU systems, where it scales up very well.

show abstract

From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

Cited by 13 publications

References 37 publications

Energy aware scheduling model and online heuristics for stencil codes on heterogeneous computing architectures

Energy aware scheduling model and online heuristics for stencil codes on heterogeneous computing architectures

G-DNA – a highly efficient multi-GPU/MPI tool for aligning nucleotide reads

Real-time motion tracking using optical flow on multiple GPUs

Contact Info

Product

Resources

About