gearshifft – The FFT Benchmark Suite for Heterogeneous Platforms

Steinbach, Peter; Werner, M.

doi:10.1007/978-3-319-58667-0_11

Cited by 15 publications

(10 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 2 demonstrates that for small FFTs, Kabuki is less performant than either Hazelhen or Shaheen II, but for FFTs above a size of 512 points, Kabuki is significantly more performant than either Hazelhen or Shaheen II. Vector computers and graphics processing units typically have higher memory bandwidth but also higher latency than typical CPUS found, thus similar results for one dimensional FFT performance are also reported in [30], where for small FFTs, CPU performance is best, but for large FFTs, GPU performance is better. There are many parallel scientific computing programs that have modeling assumptions built into them (for example in computational fluid mechanics, materials science and chemistry).…”

Section: Lessons Learnedsupporting

confidence: 68%

“…Repeating experiments multiple times has been suggested as a means of verifying reproducibility in benchmarking [16]. Such a methodology has been implemented in gearshifft, a heterogeneous fast Fourier transform benchmark suite [30,36]. In most cases experiments were repeated several times, usually successively, with minor differences between results.…”

Section: Lessons Learnedmentioning

confidence: 99%

“…Reproducibility here is taken to mean either reproducible computational results and/or reproducible execution time measurements. A number of reproducibility initiatives have been proposed [16,19,29,31]. These range from, a general algorithm description, making the code available, making an input deck and execution environment available, to documenting the entire workflow.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Reproducibility in Benchmarking Parallel Fast Fourier Transform based Applications

Aseeri

Muite

Takahashi

2019

Companion of the 2019 ACM/SPEC International Conference on Performance Engineering

View full text Add to dashboard Cite

An overview of concerns observed in allowing for reproducibility in parallel applications that heavily depend on the three dimensional distributed memory fast Fourier transform are summarized. Suggestions for reproducibility categories for benchmark results are given. CCS CONCEPTS• Mathematics of computing → Computation of transforms;• Theory of computation → Massively parallel algorithms;• Software and its engineering → Software performance; • Hardware → Testing with distributed and parallel systems.

show abstract

Section: Lessons Learnedsupporting

confidence: 68%

Section: Lessons Learnedmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Reproducibility in Benchmarking Parallel Fast Fourier Transform based Applications

Aseeri

Muite

Takahashi

2019

Companion of the 2019 ACM/SPEC International Conference on Performance Engineering

View full text Add to dashboard Cite

show abstract

“…Fast FFTs on GPUs with CUDA and OpenCL: FFTs on GPUs typically provide up to an order of magnitude advantage over FFTW, 35 particularly if high-end NVIDIA GPUs are used. The pyculib library 36 (formerly Anaconda Accelerate 37 ) provides a python wrapper around the NVIDIA cuFFT Library, 38 allowing parallel computation of FFTs on a GPU.…”

Section: Accelerating the Discrete Fast Fourier Transform: Gpus And/omentioning

confidence: 99%

Accelerated modeling of near and far-field diffraction for coronagraphic optical systems

Douglas¹,

Perrin²

2018

Space Telescopes and Instrumentation 2018: Optical, Infrared, and Millimeter Wave

View full text Add to dashboard Cite

Accurately predicting the performance of coronagraphs and tolerancing optical surfaces for high-contrast imaging requires a detailed accounting of diffraction effects. Unlike simple Fraunhofer diffraction modeling, near and farfield diffraction effects, such as the Talbot effect, are captured by plane-to-plane propagation using Fresnel and angular spectrum propagation. This approach requires a sequence of computationally intensive Fourier transforms and quadratic phase functions, which limit the design and aberration sensitivity parameter space which can be explored at high-fidelity in the course of coronagraph design. This study presents the results of optimizing the multi-surface propagation module of the open source Physical Optics Propagation in PYthon (POPPY) package. This optimization was performed by implementing and benchmarking Fourier transforms and array operations on graphics processing units, as well as optimizing multithreaded numerical calculations using the NumExpr python library where appropriate, to speed the end-to-end simulation of observatory and coronagraph optical systems. Using realistic systems, this study demonstrates a greater than five-fold decrease in wall-clock runtime over POPPY's previous implementation and describes opportunities for further improvements in diffraction modeling performance.

show abstract

“…The HPC Challenge benchmark suite [32] which is developed by the University of Tennessee is one of the well-known HPC benchmark suites and is used in many research works [33][34][35]. This suite is composed of several benchmarks, each of which focuses on a particular feature of the HPC clusters such as the ability to do floating-point calculations, the communication speed between nodes, and the potentials of running demanding algorithms such as DFT.…”

Section: Hpc Benchmarks and Resultsmentioning

confidence: 99%

Computational storage: an efficient and scalable platform for big data and HPC applications

et al. 2019

View full text Add to dashboard Cite

In the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the "move process to data" paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.

show abstract

gearshifft – The FFT Benchmark Suite for Heterogeneous Platforms

Cited by 15 publications

References 24 publications

Reproducibility in Benchmarking Parallel Fast Fourier Transform based Applications

Reproducibility in Benchmarking Parallel Fast Fourier Transform based Applications

Accelerated modeling of near and far-field diffraction for coronagraphic optical systems

Computational storage: an efficient and scalable platform for big data and HPC applications

Contact Info

Product

Resources

About