2012
DOI: 10.1145/2366231.2337169
|View full text |Cite
|
Sign up to set email alerts
|

Boosting mobile GPU performance with a decoupled access/execute fragment processor

Abstract: Smartphones represent one of the fastest growing markets, providing significant hardware/software improvements every few months. However, supporting these capabilities reduces the operating time per battery charge. The CPU/GPU component is only left with a shrinking fraction of the power budget, since most of the energy is consumed by the screen and the antenna. In this paper, we focus on improving the energy efficiency of the GPU since graphical applications consist an important part of the existing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 23 publications
0
7
0
Order By: Relevance
“…Based on the outcome of the computation, a new set of events are created in parallel and arrive at the event block 4 . The event block applies boolean algebra on the events (as configured for that region with Event-Condition-Action (ECA) rules) to generate any actions which are delivered as action indices to the action block 5 . The action block takes the indices, selects the ones that could be issued in the current cycle, and controls the data bus to move the data values to the accelerator or between event queues 6 7 .…”
Section: Accelerators (Npu)mentioning
confidence: 99%
“…Based on the outcome of the computation, a new set of events are created in parallel and arrive at the event block 4 . The event block applies boolean algebra on the events (as configured for that region with Event-Condition-Action (ECA) rules) to generate any actions which are delivered as action indices to the action block 5 . The action block takes the indices, selects the ones that could be issued in the current cycle, and controls the data bus to move the data values to the accelerator or between event queues 6 7 .…”
Section: Accelerators (Npu)mentioning
confidence: 99%
“…Based on the outcome of the computation, a new set of events are created in parallel and arrive at the event block 4 . The event block applies boolean algebra on the events (as configured for that region with Event-Condition-Action (ECA) rules) to generate any actions which are delivered as action indices to the action block 5 . The action block takes the indices, selects the ones that could be issued in the current cycle, and controls the data bus to move the data values to the accelerator or between event queues 6 7 .…”
Section: Accelerators (Npu)mentioning
confidence: 99%
“…We combine the power of fetch, decode, dispatch, issue, RF and bypass logic as the red bar and show the functional unit power and load-store unit power separately 5 . Within MAD2, the computation block's power is 639 mW (closely matching and slightly less than FU power of OOO2), with the rest consuming 112 mW; the LSU is 800 mW.…”
Section: Power Breakdownmentioning
confidence: 99%
See 1 more Smart Citation
“…Decoupled Execution: Recently, there has been renewed interest in decoupled execution for improving performance in GPUs and manycore [6,1]. Decoupled execution leverages the compiler to partition a single thread of execution into separate memory-accessing and memory-consuming instruction streams called strands, which communicate data and control flow decisions with one another through FIFO data queues [22].…”
Section: Goals and Latency Tolerance Techniquesmentioning
confidence: 99%