CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator

Siegel, Jakob; Ributzka, Juergen; Li, Xiaoming

doi:10.1260/1748-3018.5.2.341

Cited by 9 publications

(14 citation statements)

References 13 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These models greatly improve issues of premature convergence or convergence to local optimal solutions in past GA methods, where a single large population is used. To take advantage multitasking features of multi-core CPUs, 15 these models implement parallel computing on CPU by allocating unused cores with the blocks not yet been operated. The GPU varies greatly from the CPU in parallel computing.…”

Section: Parallel Gamentioning

confidence: 99%

“…Computers usually use long sequences of numbers and a seed for table lookup to generate random numbers. We use NVIDIA's Curand.kernel 14,15 and treat each thread ID as the seed of this thread, so that each thread can have its own independent random field. A global evaluation is performed after each thread performs evaluation of its own chromosomes and saves the evaluation results into global memory.…”

Section: Parallel Simd-based Algorithmmentioning

confidence: 99%

See 1 more Smart Citation

Parallel genetic algorithms on the graphics processing units using island model and simulated annealing

Lin

Liu

2017

Advances in Mechanical Engineering

View full text Add to dashboard Cite

To solve a non-deterministic polynomial-hard problem, we can adopt an approximate algorithm for finding the nearoptimal solution to reduce the execution time. Although this approach can come up with solutions much faster than brute-force methods, the downside of it is that only approximate solutions are found in most situations. The genetic algorithm is a global search heuristic and optimization method. Initially, genetic algorithms have many shortcomings, such as premature convergence and the tendency to converge toward local optimal solutions; hence, many parallel genetic algorithms are proposed to solve these problems. Currently, there exist many literatures on parallel genetic algorithms. Also, a variety of parallel genetic algorithms have been derived. This study mainly uses the advantages of graphics processing units, which has a large number of cores, and identifies optimized algorithms suitable for computation in single instruction, multiple data architecture of graphics processing units. Furthermore, the parallel simulated annealing method and spheroidizing annealing are also used to enhance performance of the parallel genetic algorithm.

show abstract

Section: Parallel Gamentioning

confidence: 99%

Section: Parallel Simd-based Algorithmmentioning

confidence: 99%

Parallel genetic algorithms on the graphics processing units using island model and simulated annealing

Lin

Liu

2017

Advances in Mechanical Engineering

View full text Add to dashboard Cite

show abstract

“…Using GPU streams, commands for memory transfers and kernel function executions that belong to different streams can be overlapped. In our implementations, we store data in the structure-of-array (SOA) format [28] in order to maximize the possibility of using coalesced memory transactions. In the SOA format, the individual attributes of each record are stored contiguously so that component-wise memory access by threads is possible regardless of the size of a record.…”

Section: Gpu Overheadmentioning

confidence: 99%

“…This results in as many uncoalesced memory reads from the global memory as there are the dimensions of data whenever the elements of such a structure must be accessed by a thread. Conversely, the SOA format guarantees that all reads from global memory are coalesced, regardless of the number of dimensions of data, since all threads of the same warp-half access consecutive single values in the global memory [28]. An in-memory R-tree called Q-tree, which means 'Querytree' is used for managing the GPU buffer.…”

Section: Gpu-based Range Querymentioning

confidence: 99%

Accelerating Range Query Processing on R-Tree Using Graphics Processing Units

Kim

Choi

et al. 2013

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYRecently, various research efforts have been conducted to develop strategies for accelerating multi-dimensional query processing using the graphics processing units (GPUs). However, well-known multidimensional access methods such as the R-tree, B-tree, and their variants are hardly applicable to GPUs in practice, mainly due to the characteristics of a hierarchical index structure. More specifically, the hierarchical structure not only causes frequent transfers of small volumes of data but also provides limited opportunity to exploit the advanced data parallelism of GPUs. To address these problems, we propose an approach that uses GPUs as a buffer. The main idea is that object entries in recently visited leaf nodes are buffered in the global memory of GPUs and processed by massive parallel threads of the GPUs. Through extensive performance studies, we observed that the proposed approach achieved query performance up to five times higher than that of the original R-tree.

show abstract

“…extensively discussed in technical manuals for various many-core devices, e.g., CPU [7], GPU [14] or the Cell processor [5]. The major choices of AoS and SoA can be further refined to form hybrid formats, e.g., arrays of structures of arrays [1] or structures of arrays of structures [16].…”

Section: Introductionmentioning

confidence: 99%

Data layout optimization for multi-valued containers in OpenCL

Strzodka¹

2012

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator

Cited by 9 publications

References 13 publications

Parallel genetic algorithms on the graphics processing units using island model and simulated annealing

Parallel genetic algorithms on the graphics processing units using island model and simulated annealing

Accelerating Range Query Processing on R-Tree Using Graphics Processing Units

Data layout optimization for multi-valued containers in OpenCL

Contact Info

Product

Resources

About