Spatial Partitioning Strategies for Memory-Efficient Ray Tracing of Particles

Gralka, Patrick; Wald, Ingo; Geringer, Sergej; Reina, Guido; Ertl, Thomas

doi:10.1109/ldav51489.2020.00012

Cited by 4 publications

(5 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Nearly a decade later, with the surge of Artificial Intelligence (AI), the community realized that the performance of GPUs was not high enough to properly handle the new Deep Learning models being developed. For this reason, near 2017, NVIDIA introduced tensor cores [3][4][5][6][7][8][9][10][11][12] inside the chip to further accelerate the performance of all AI applications. GPU Tensor cores are Application Specific Integrated Circuits (ASICs), or simply specific-purpose cores that perform fast matrix multiply accumulate (MMA) operations.…”

Section: From General Purpose To Specific Purposementioning

confidence: 99%

“…Doing it by brute force would mean checking all triangles of the scene for each ray, making it very inefficient. Space partitioning trees [9] and other variants of trees have been implemented in GPU [8], although the nature of trees introduce a difficult irregular memory accesses for the GPU architecture which is limited in this aspect. As a solution to this problem, an RT core offers a hardware implemented Bounding Volumne Hierarchy (BVH) tree data structure [15], allowing a ray to find ray/triangle intersections (other custom primitives as well) overall significantly faster than the software-implemented alternatives.…”

Section: From General Purpose To Specific Purposementioning

confidence: 99%

“…Successful research has been done in the recent years. In the case of tensor cores, new ways have been proposed to further accelerate arithmetic reductions [16,13,[5][6][7][8][9][10][11][12][17][18][19][20][21] prefix sum [4-12, 17-21, 22-29] Fast Fourier Transform [22], [10], [23], [5], stencil computations for PDE simulations [11] and even fractals [14,. In general, all of these works achieve significant higher performance when compared to doing it traditionally in GPU.…”

Section: New Research Opportunitiesmentioning

confidence: 99%

See 2 more Smart Citations

Untitled

2024

CTCSA

View full text Add to dashboard Cite

Parallel processors have undergone a profound transformation in recent years, transitioning from homogeneous generalpurpose units to a heterogeneous ecosystem comprising a mix of general and specific-purpose cores on a single chip. This shift, driven by the demands of Artificial Intelligence (AI) and computer graphics applications, has not only altered the architecture of processors but has also introduced novel challenges in optimizing algorithms for parallel execution. In this brief review, we delve into the evolution of parallel processors and explore the research challenges arising from this shift. We will be focusing on the particular case of GPUs, where tensor cores and ray tracing cores have created new research opportunities on finding what other applications, different from AI and graphics, could be reformulated as a series of tensor/ray-tracing core operations and further accelerate their performance compared to their regular GPU implementation.

show abstract

Section: From General Purpose To Specific Purposementioning

confidence: 99%

Section: From General Purpose To Specific Purposementioning

confidence: 99%

Section: New Research Opportunitiesmentioning

confidence: 99%

See 1 more Smart Citation

Untitled

2024

CTCSA

View full text Add to dashboard Cite

show abstract

“…This approach also requires an acceleration structure to be efficient, which can only have two levels, of which the bottom one has bounding information [35]. A more recent approach investigates nesting of P-k-ds inside a standard BVH to balance memory requirements and resulting performance [12] and benchmarks implementations using OptiX, OSPRay and OpenGL.…”

Section: Related Workmentioning

confidence: 99%

“…Ray tracing. We compare our method against GPU ray tracing using P-k-d trees [12] in App. D. While P-k-d trees are faster per sample, increasing the number of samples per pixel to allow for sub-pixel detail degrades performance linearly, as expected.…”

Section: Comparison To Previous Workmentioning

confidence: 99%

Probabilistic Occlusion Culling using Confidence Maps for High-Quality Rendering of Large Particle Data

Ibrahim

Rautek

Reina

et al. 2022

IEEE Trans. Visual. Comput. Graphics

Self Cite

View full text Add to dashboard Cite

RENDERING visible/culled particles visible/culled density 75.5% CULLED OCCLUSION Fig. 1. Rendering a synthesized SARS-CoV-2 virus with atomistic resolution (∼40M particles). (Left) Main view rendered with 75.5% of particles culled. (Right) We estimate occlusion probabilities using particle density functions sampled from a coarse particle density volume. From front to back (left to right), both the number of particles and the volume density decrease due to detected occlusion.

show abstract