Proceedings of the 42nd Annual International Symposium on Computer Architecture 2015
DOI: 10.1145/2749469.2750399
|View full text |Cite
|
Sign up to set email alerts
|

A case for core-assisted bottleneck acceleration in GPUs

Abstract: Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, di erent bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a GPU is bottlenecked by the available o -chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive.This paper introduces the Core-Assisted Bottleneck Accel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 76 publications
(5 citation statements)
references
References 86 publications
(127 reference statements)
0
4
0
Order By: Relevance
“…After doubling the off-chip bandwidth, no application remains bandwidth limited, and therefore, increasing the off-chip bandwidth to 4× and 8× has little effect on performance. It may be possible to achieve, the 2× extra bandwidth by using data compression [37] with little changes to the architecture of existing GPUs. While technologies like 3D DRAM that offer significantly more bandwidth (and lower access latency) can be useful, they are not necessary for providing the offchip bandwidth requirements of NGPU for the range of applications that we studied.…”
Section: Resultsmentioning
confidence: 99%
“…After doubling the off-chip bandwidth, no application remains bandwidth limited, and therefore, increasing the off-chip bandwidth to 4× and 8× has little effect on performance. It may be possible to achieve, the 2× extra bandwidth by using data compression [37] with little changes to the architecture of existing GPUs. While technologies like 3D DRAM that offer significantly more bandwidth (and lower access latency) can be useful, they are not necessary for providing the offchip bandwidth requirements of NGPU for the range of applications that we studied.…”
Section: Resultsmentioning
confidence: 99%
“…Warped-compression architecture is also coupled to support compressed execution, that is, some instructions are processed without decompressing the operand values to further save energy. Vijaykumar et al [46] propose a core-assisted bottleneck acceleration (CABA) framework for GPUs, in which assist warps are automatically generated to perform specific tasks that speed up application execution. Instead of a hardware-based implementation, CABA uses assist warps to enable flexible data compression in the memory hierarchy.…”
Section: Related Workmentioning
confidence: 99%
“…• GPU Partitioning: Although a GPU is viewed as a single accelerator device by application tasks, it consists of many GPU cores that execute a given parallel workload in an aggregate manner. Hence, depending on the characteristics of workloads, only some of the GPU cores may be utilized [179]. To address this GPU underutilization problem, some of recent GPU architectures, e.g., NVIDIA Kepler [180], introduce a feature to execute multiple GPU functions concurrently.…”
Section: Architecture Support For Computational Acceleratorsmentioning
confidence: 99%