An Evaluation Method for Screen-Only and Gravel-Pack Completions

Memory access efficiency is a key factor for fully exploiting the computational power of Graphics Processing Units (GPUs). However, many details of the GPU memory hierarchy are not released by the vendors. We propose a novel fine-grained benchmarking approach and apply it on two popular GPUs, namely Fermi and Kepler, to expose the previously unknown characteristics of their memory hierarchies. Specifically, we investigate the structures of different cache systems, such as data cache, texture cache, and the translation lookaside buffer (TLB). We also investigate the impact of bank conflict on shared memory access latency. Our benchmarking results offer a better understanding on the mysterious GPU memory hierarchy, which can help in the software optimization and the modelling of GPU architectures. Our source code and experimental results are publicly available.

show abstract

G-CRS: GPU Accelerated Cauchy Reed-Solomon Coding

Liu

Wang

Chu

et al. 2018

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

PErasure: A parallel Cauchy Reed-Solomon coding library for GPUs

Chu

Liu

Ouyang

et al. 2015

View full text Add to dashboard Cite

In recent years, erasure coding has been adopted by large-scale cloud storage systems to replace data replication. With the increase of disk I/O throughput and network bandwidth, the speed of erasure coding becomes one of the key system bottlenecks. In this paper, we propose to offload the task of erasure coding to Graphics Processing Units (GPUs). Specifically, we have designed and implemented PErasure, a parallel Cauchy Reed-Solomon (CRS) coding library. We compare the performance of PErasure with that of two state-of-the-art libraries: Jerasure (for CPUs) and Gibraltar (for GPUs). Our experiments show that the raw coding speed of PErasure on a $500 Nvidia GTX780 card is about 10 times faster than that of multithreaded Jerasure on a quad-core modern CPU, and 2-4 times faster than Gibraltar on the same GPU. PErasure can achieve up to 10GB/s of overall encoding speed using just a single GPU for a large storage system that can withstand up to 8 disk failures. IEEE ICC 2015 SAC -Data Storage and Cloud Computing 978-1-4673-6432-4/15/$31.00 ©2015 IEEE

show abstract

Towards more efficient ophthalmic disease classification and lesion location via convolution transformer

Wen

Jian

Xiang

et al. 2022

Computer Methods and Programs in Biomedicine

View full text Add to dashboard Cite

A Quantitative Survey of Communication Optimizations in Distributed Deep Learning

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chengjian Liu

Benchmarking the Memory Hierarchy of Modern GPUs

G-CRS: GPU Accelerated Cauchy Reed-Solomon Coding

PErasure: A parallel Cauchy Reed-Solomon coding library for GPUs

Towards more efficient ophthalmic disease classification and lesion location via convolution transformer

A Quantitative Survey of Communication Optimizations in Distributed Deep Learning

Contact Info

Product

Resources

About