2018
DOI: 10.1007/978-3-031-01759-9
|View full text |Cite
|
Sign up to set email alerts
|

General-Purpose Graphics Processor Architectures

Abstract: Synthesis Lectures on Computer Architecture publishes 50-to 100-page publications on topics pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. Th scope will largely follow the purview of premier computer architecture conferences, such as ISCA, HPCA, MICRO, and ASPLOS.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(7 citation statements)
references
References 106 publications
0
7
0
Order By: Relevance
“…A memory partition unit comprises L2 Cache Banks, one or more memory access schedulers, and a raster operation (ROP). Multiple memory partition units exist in a GPGPU, with L2 Cache banks serving as data caches, memory access schedulers reordering memory read and write operations and dispatching them to DRAM for enhanced access efficiency, and ROP handling graphic and atomic operations [13,14,15].…”
Section: C-prefetcher Designmentioning
confidence: 99%
“…A memory partition unit comprises L2 Cache Banks, one or more memory access schedulers, and a raster operation (ROP). Multiple memory partition units exist in a GPGPU, with L2 Cache banks serving as data caches, memory access schedulers reordering memory read and write operations and dispatching them to DRAM for enhanced access efficiency, and ROP handling graphic and atomic operations [13,14,15].…”
Section: C-prefetcher Designmentioning
confidence: 99%
“…While the cores inside the same core cluster (Streaming multiprocessor-SM) have access to the scratchpad memory (shared memory or L1 cache), all the cores can communicate through the L2 cache structure via interconnect. DRAM-based global device memory maintains larger but relatively slower data access for all threads executing in the device [22]. Not only does a modern GPU device include generalpurpose cores but also special function units (SFU) for fast transcendental function computations as well as tensor cores for efficient matrix multiplications.…”
Section: Gpu Architecturesmentioning
confidence: 99%
“…In this section, we provide an overview of these components involved in atomic execution. Note, since the architecture of a GPU is a black box, we explicitely refer to the work of Aamodt et al and Glasco et al [15,16] for our work. We highly recommend these articles for more insights.…”
Section: Atomics In Gpumentioning
confidence: 99%