Chaulio R. Ferreira scite author profile

Bäder

2017

We present a patch-based approach for tsunami simulation with parallel adaptive mesh refinement on the Salomon supercomputer. The special architecture of Salomon, with two Intel Xeon CPUs (Haswell architecture) and two Intel Xeon Phi coprocessors (Knights Corner) per compute node, suggests truly heterogeneous load balancing instead of offload approaches, because host and accelerator achieve comparable performance for our simulations. We use a tree-structured mesh refinement strategy resulting from newest-vertex bisection of triangular grid cells, but introduce small uniform grid patches into the leaves of the tree to allow vectorisation of the Finite Volume solver over grid cells. In particular, we implemented vectorised versions of the approximate Riemann solvers, exploiting Fortran's array notations where possible. While large patches increase computational performance due to vectorisation, improved memory access and reduced meshing overhead, they also increase the overall number of processed cells. Thus, a trade-off must be found regarding the patch size. We experimented with different patch sizes in a study of the time-to-solution of a simulation of the 2011 Tohoku tsunami, and found that relatively small patches with 8 2 cells resulted in the smallest execution times. We use the Xeon Phis in symmetric mode and apply heterogeneous load balancing between hosts and coprocessors, identifying the relative load distribution either from on-the-fly runtime measurements or from a priori exhaustive testing. Both approaches perform better than homogeneous load balancing and better than using only the CPUs or only the Xeon Phi coprocessors in native mode. In all setups , however, the absolute speedups are impeded by the slow MPI communication between Xeon Phi coprocessors. CCS CONCEPTS • Mathematics of computing → Mathematical software performance; Partial differential equations; • Computing methodologies → Massively parallel and high-performance simulations; Vector / streaming algorithms; • Applied computing → Earth and atmospheric sciences;

More efficient terrain viewshed computation on massive datasets using external memory

Magalhães

Andrade

et al. 2012

We present a better algorithm and implementation for external memory viewshed computation. It is about four times faster than the most recent and most efficient published methods. Ours is also much simpler. Since processing large datasets can take hours, this improvement is significant. To reduce the total number of I/O operations, our method is based on subdividing the terrain into blocks which are stored in a special data structure managed as a cache memory.The viewshed is that region of the terrain that is visible by a fixed observer, who may be on or above the terrain. Its applications range from visual nuisance abatement to radio transmitter siting and surveillance.

Vectorization of Riemann solvers for the single- and multi-layer Shallow Water Equations

Mandli

Bäder

2018

We discuss vectorization of normal and transverse Riemann solvers for the single-and multi-layer shallow water equations. Our approach is simple and portable, as it is based on auto-vectorization by the compiler, aided by OpenMP 4.0 directives. Despite the high complexity of the solver routines, the Intel Fortran Compiler proved itself able to successfully vectorize loops containing calls to these solvers, after only a few small changes in their code. We evaluate the performance of the vectorized Riemann solvers within the context of GeoClaw, a software designed for simulation of geophysical flows with finite volume methods. Our performance studies consider two platforms with different sets of SIMD instructions: a dual-socket Haswell system with the AVX2 instruction set (256-bit) and an Intel Xeon Phi (Knights Landing) with AVX-512 instructions (512-bit). The experimental results indicate performance improvements of up to 2.1x on the former platform and up to 6.5x on the latter (with double-precision arithmetic). We also show that these speedups can easily compensate for the overhead introduced by the rearrangement of the simulation data structures, which might be necessary to achieve efficient vectorization.

An Efficient External Memory Algorithm for Terrain Viewshed Computation

ACM Trans. Spatial Algorithms Syst.

Andrade

Magalhães

et al. 2016

This article presents TILEDVS, a fast external algorithm and implementation for computing viewsheds. TILEDVS is intended for terrains that are too large for internal memory, even more than 100,000×100,000 points. It subdivides the terrain into tiles that are stored compressed on disk and then paged into memory with a custom cache data structure and least recently used algorithm. If there is sufficient available memory to store a whole row of tiles, which is easy, then this specialized data management is faster than relying on the operating system's virtual memory management. Applications of viewshed computation include siting radio transmitters, surveillance, and visual environmental impact measurement. TILEDVS runs a rotating line of sight from the observer to points on the region boundary. For each boundary point, it computes the visibility of all terrain points close to the line of sight. The running time is linear in the number of points. No terrain tile is read more than twice. TILEDVS is very fast, for instance, processing a 104,000×104,000 terrain on a modest computer with only 512MB of RAM took only 1 1 2 hours. On large datasets, TILEDVS was several times faster than competing algorithms, such as the ones included in GRASS. The source code of TILEDVS is freely available for nonprofit researchers to study, use, and extend. A preliminary version of this algorithm appeared in a four-page ACM SIGSPATIAL GIS 2012 conference paper, "More Efficient Terrain Viewshed Computation on Massive Datasets Using External Memory." This more detailed version adds the fast lossless compression stage that reduces the time by 30% to 40%, and many more experiments and comparisons.

An efficient GPU multiple-observer siting method based on sparse-matrix multiplication

Pena

Magalhães

Andrade

et al. 2014

This paper proposes an e cient parallel heuristic for siting observers on raster terrains. More specifically, the goal is to choose the smallest set of points on a terrain such that observers located in these points are able to visualize at least a given percentage of the terrain. This problem is NP-Hard and has several applications such as determining the best places to position (site) communication or monitoring towers on a terrain. Since siting observers is a massive operation, its solution requires a huge amount of processing time even to obtain an approximate solution using a heuristic. This is still more evident when processing high resolution terrains that have become available due to modern data acquiring technologies such as LIDAR and IFSAR.Our new implementation uses dynamic programming and CUDA to accelerate the swap local search heuristic, which was proposed in previous works. Also, to e ciently use the parallel computing resources of GPUs, we adapted some techniques previously developed for sparse-dense matrix multiplication.We compared this new method with previous parallel implementations and the new method is much more e cient than the previous ones. It can process much larger terrains (the older methods are restrictive about terrain size) and it is faster.