Joanna Potiopa scite author profile

This paper concerns an Intel Xeon Phi implementation of the explicit fourthorder Runge-Kutta method (RK4) for very sparse matrices with very short rows. Such matrices arise during Markovian modeling of computer and telecommunication networks. In this work an implementation based on Intel Math Kernel Library (Intel MKL) routines and the authors' own implementation, both using the CSR storage scheme and working on Intel Xeon Phi, were investigated. The implementation based on the Intel MKL library uses the high-performance BLAS and Sparse BLAS routines. In our application we focus on OpenMP style programming. We implement SpMV operation and vector addition using the basic optimizing techniques and the vectorization. We evaluate our approach in native and offload modes for various number of cores and thread allocation affinities. Both implementations (based on Intel MKL and made by the authors) were compared in respect of the time, the speedup and the performance. The numerical experiments on Intel Xeon Phi show that the performance of authors' implementation is very promising and gives a gain of up to two times compared to the multithreaded implementation (based on Intel MKL) running on CPU (Intel Xeon processor) and even three times in comparison with the application which uses Intel MKL on Intel Xeon Phi.The authors are grateful to Czestochowa University of Technology for granting access to Intel CPU and Intel Xeon Phi platforms providing by the MICLAB

show abstract

The impact of vectorization and parallelization of the slope algorithm on performance and energy efficiency on multi-core architecture

Bylina¹,

Potiopa²,

Klisowski³

et al. 2021

View full text Add to dashboard Cite

Calculation of land-surface parameters (e.g. slope, aspect, curvature) is an important part of many geospatial analyses. Current research trends are aimed at developing new software techniques to achieve the best performance and energy trade-off. In our work, we concentrate on the vectorization and parallelization to improve overall energy efficiency and performance of the neighborhood raster algorithms for the computation of land-surface parameters. We chose the slope calculation algorithm as the basis for our investigation. The parallelization was achieved through redesigning the the original sequential code with OpenMP SIMD vectorization hints for compiler, OpenMP loop parallelization, and the hybrid of these techniques. To evaluate both performance and energy savings, we tested our vector-parallel implementations on a multi-core computer for various data sizes. RAPL interface was used to measure energy consumption. The results showed that optimization towards high performance can also be an effective strategy for improving energy efficiency.

show abstract

Piecewise Cubic Interpolation on Distributed Memory Parallel Computers and Clusters of Workstations

Stpiczyński

Potiopa

View full text Add to dashboard Cite

The aim of this paper is to present two new portable and high performance implementations of routines that can be used for piecewise cubic interpolation. The first one (sequential) is based on LAPACK routines, while the next, based on ScaLAPACK is designed for distributed memory parallel computers and clusters. The results of experiments performed on a cluster of twenty Itanium 2 processors and on Cray X1 are also presented and shortly discussed.

show abstract

Solving a kind of BVP for ODEs on heterogeneous CPU + CUDA-enabled GPU systems

Stpiczyński

Potiopa

2010

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Joanna Potiopa

Solving a kind of boundary-value problem for ordinary differential equations using Fermi—The next generation CUDA computing architecture

Explicit Fourth-Order Runge–Kutta Method on Intel Xeon Phi Coprocessor

The impact of vectorization and parallelization of the slope algorithm on performance and energy efficiency on multi-core architecture

Piecewise Cubic Interpolation on Distributed Memory Parallel Computers and Clusters of Workstations

Solving a kind of BVP for ODEs on heterogeneous CPU + CUDA-enabled GPU systems

Contact Info

Product

Resources

About