Analysis of classic algorithms on GPUs

Ma, Lin; Chamberlain, Roger D.; Agrawal, Kunal

doi:10.1109/hpcsim.2014.6903670

Cited by 10 publications

(3 citation statements)

References 27 publications

(39 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Theoretically, as long as n K = O(n G log n G ), the sequential time complexity is O(n T n R n G log n G ). We achieve near-linear speed-up by distributing the work on multicore CPUs/GPUs, but the precise complexity analysis depends on the GPU architecture [54]. If we use the same resolution to resolve the cutter as the grid size for convolution, then n K n G since the cutter is typically much smaller in size than the design domain.…”

Section: Discussionmentioning

confidence: 99%

Topology optimization with accessibility constraint for multi-axis machining

Mirzendehdel

Behandish

Nelaturi

2020

Computer-Aided Design

View full text Add to dashboard Cite

In this paper, we present a topology optimization (TO) framework to enable automated design of mechanical components while ensuring the result can be manufactured using multi-axis machining. Although TO improves the part's performance, the as-designed model is often geometrically too complex to be machined and the as-manufactured model can significantly vary due to machining constraints that are not accounted for during TO. In other words, many of the optimized design features cannot be accessed by a machine tool without colliding with the part (or fixtures). The subsequent post-processing to make the part machinable with the given setup requires trial-and-error without guarantees on preserving the optimized performance. Our proposed approach is based on the well-established accessibility analysis formulation using convolutions in configuration space that is extensively used in spatial planning and robotics. We define an inaccessibility measure field (IMF) over the design domain to identify non-manufacturable features and quantify their contribution to non-manufacturability. The IMF is used to penalize the sensitivity field of performance objectives and constraints to prevent formation of inaccessible regions. Unlike existing discrete formulations, our IMF provides a continuous spatial field that is desirable for TO convergence. Our approach applies to arbitrary geometric complexity of the part, tools, and fixtures, and is highly parallelizable on multi-core architecture. We demonstrate the effectiveness of our framework on benchmark and realistic examples in 2D and 3D. We also show that it is possible to directly construct manufacturing plans for the optimized designs based on the accessibility information.

show abstract

Section: Discussionmentioning

confidence: 99%

Topology optimization with accessibility constraint for multi-axis machining

Mirzendehdel

Behandish

Nelaturi

2020

Computer-Aided Design

View full text Add to dashboard Cite

show abstract

“…As to asymptotic models, Ma et al [10] designed the Threaded Multi-core Memory (TMM) model, in which a number of classic algorithms are analyzed in terms of both their computational complexity and their memory complexity assuming perfect scheduling [11], [12]. Kirtzic et al [13] proposed the Parallel GPU Model (PGM), which is essentially an adaption of the Bulk-Synchronous Parallel (BSP) model [14], and equates a superstep in BSP with a function unit of a GPU program.…”

Section: Introductionmentioning

confidence: 99%

Performance modeling for highly-threaded many-core GPUs

Chamberlain

Agrawal

2014

2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors

Self Cite

View full text Add to dashboard Cite

Highly-threaded many-core GPUs can provide high throughput for a wide range of algorithms and applications. Such machines hide memory latencies via the use of a large number of threads and large memory bandwidth. The achieved performance, therefore, depends on the parallelism exploited by the algorithm, the effectiveness of latency hiding, and the utilization of multiprocessors (occupancy). In this paper, we extend previously proposed analytical models, jointly addressing parallelism, latency-hiding, and occupancy. In particular, the model not only helps to explore and reduce the configuration space for tuning kernel execution on GPUs, but also reflects performance bottlenecks and predicts how the runtime will trend as the problem and other parameters scale. The model is validated with empirical experiments. In addition, the model points to at least one circumstance in which the occupancy decisions automatically made by the scheduler are clearly sub-optimal in terms of runtime.

show abstract

“…13 Utilizing shared memory on GPU would be beneficial in terms of processing speed in some applications. 17,18 However, because of the size restriction of shared memory, it is not feasible to utilize such memory in APC algorithms implementations. As described in Secs.…”

Section: Memory Space Usagementioning

confidence: 99%

General purpose graphic processing unit implementation of adaptive pulse compression algorithms

Cai

Zhang

2017

J. Appl. Remote Sens

View full text Add to dashboard Cite

, "General purpose graphic processing unit implementation of adaptive pulse compression algorithms," J. Appl. Remote Sens. 11(3), 035009 (2017), doi: 10.1117/1.JRS.11.035009. Abstract. This study introduces a practical approach to implement real-time signal processing algorithms for general surveillance radar based on NVIDIA graphical processing units (GPUs). The pulse compression algorithms are implemented using compute unified device architecture (CUDA) libraries such as CUDA basic linear algebra subroutines and CUDA fast Fourier transform library, which are adopted from open source libraries and optimized for the NVIDIA GPUs. For more advanced, adaptive processing algorithms such as adaptive pulse compression, customized kernel optimization is needed and investigated. A statistical optimization approach is developed for this purpose without needing much knowledge of the physical configurations of the kernels. It was found that the kernel optimization approach can significantly improve the performance. Benchmark performance is compared with the CPU performance in terms of processing accelerations. The proposed implementation framework can be used in various radar systems including ground-based phased array radar, airborne sense and avoid radar, and aerospace surveillance radar.

show abstract

Analysis of classic algorithms on GPUs

Cited by 10 publications

References 27 publications

Topology optimization with accessibility constraint for multi-axis machining

Topology optimization with accessibility constraint for multi-axis machining

Performance modeling for highly-threaded many-core GPUs

General purpose graphic processing unit implementation of adaptive pulse compression algorithms

Contact Info

Product

Resources

About