An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Hong, Sunpyo; Kim, Hyesoon

doi:10.1145/1555754.1555775

Cited by 403 publications

(98 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Threads in one block cannot communicate with the threads in the other block as they may be scheduled at different times. This architecture implies that for any job to be run on GPU, it has to be broken into blocks of computation that can run independently without communicating with each other [32]. These blocks will have to be further broken down into smaller tasks that execute on an individual thread that may communicate with other threads in the same block.…”

Section: Vertical Scaling Platformsmentioning

confidence: 99%

A survey on platforms for big data analytics

Singh

Reddy

2014

Journal of Big Data

326

183

View full text Add to dashboard Cite

The primary purpose of this paper is to provide an in-depth analysis of different platforms available for performing big data analytics. This paper surveys different hardware platforms available for big data analytics and assesses the advantages and drawbacks of each of these platforms based on various metrics such as scalability, data I/O rate, fault tolerance, real-time processing, data size supported and iterative task support. In addition to the hardware, a detailed description of the software frameworks used within each of these platforms is also discussed along with their strengths and drawbacks. Some of the critical characteristics described here can potentially aid the readers in making an informed decision about the right choice of platforms depending on their computational needs. Using a star ratings table, a rigorous qualitative comparison between different platforms is also discussed for each of the six characteristics that are critical for the algorithms of big data analytics. In order to provide more insights into the effectiveness of each of the platform in the context of big data analytics, specific implementation level details of the widely used k-means clustering algorithm on various platforms are also described in the form pseudocode.

show abstract

Section: Vertical Scaling Platformsmentioning

confidence: 99%

A survey on platforms for big data analytics

Singh

Reddy

2014

Journal of Big Data

326

183

View full text Add to dashboard Cite

show abstract

“…CUDA programs on the host (CPU) invoke a kernel which runs on the device (GPU). All threads within a block are executed concurrently on a ar-chitecture [14]. In addition, when a multiprocessor is given one or more thread blocks to execute, it partitions them into groups of 32 parallel threads termed warp.…”

Section: Preliminariesmentioning

confidence: 99%

“…Guo et al [9] showed a performance modeling and optimizing analysis to predict and optimize SpMV performance on GPUs. A simple analytical GPU model to predict the execution time of massively parallel programs was given by Hong et al [14]. Schaa et al [15] presented a model to accurately estimate the execution time of GPU applications by varying the configurations.…”

Section: Introductionmentioning

confidence: 99%

Performance Prediction Based on Statistics of Sparse Matrix-Vector Multiplication on GPUs

Wang¹,

Gu²,

Li³

2017

JCC

View full text Add to dashboard Cite

As one of the most essential and important operations in linear algebra, the performance prediction of sparse matrix-vector multiplication (SpMV) on GPUs has got more and more attention in recent years. In 2012, Guo and Wang put forward a new idea to predict the performance of SpMV on GPUs. However, they didn't consider the matrix structure completely, so the execution time predicted by their model tends to be inaccurate for general sparse matrix. To address this problem, we proposed two new similar models, which take into account the structure of the matrices and make the performance prediction model more accurate. In addition, we predict the execution time of SpMV for CSR-V, CSR-S, ELL and JAD sparse matrix storage formats by the new models on the CUDA platform. Our experimental results show that the accuracy of prediction by our models is 1.69 times better than Guo and Wang's model on average for most general matrices.

show abstract

“…[15] evaluated the different placements of memory controller in many-core CMPs. Full system was simulated to test the communication performance in order to find an optimal placement.…”

Section: Related Workmentioning

confidence: 99%

Regional cache organization for NoC based many-core processors

Cao

et al. 2013

Journal of Computer and System Sciences

View full text Add to dashboard Cite

An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Cited by 403 publications

References 18 publications

A survey on platforms for big data analytics

A survey on platforms for big data analytics

Performance Prediction Based on Statistics of Sparse Matrix-Vector Multiplication on GPUs

Regional cache organization for NoC based many-core processors

Contact Info

Product

Resources

About