Proceedings of the 48th International Symposium on Microarchitecture 2015
DOI: 10.1145/2830772.2830785
|View full text |Cite
|
Sign up to set email alerts
|

Confluence

Abstract: Multi-megabyte instruction working sets of server workloads defy the capacities of latency-critical instructionsupply components of a core; the instruction cache (L1-I) and the branch target buffer (BTB). Recent work has proposed dedicated prefetching techniques aimed separately at L1-I and BTB, resulting in high metadata costs and/or only modest performance improvements due to the complex control-flow histories required to effectively fill the two components ahead of the core's fetch stream.This work makes th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
26
1

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 34 publications
(27 citation statements)
references
References 44 publications
0
26
1
Order By: Relevance
“…With the increasing demands on BTB capacity as program loads grow, reducing power consumption in BTB design is becoming increasingly vital. Kaynak et al proposed the AirBTB, a basic block BTB design that employs a branch bitmap for each branch table entry, simplifying the identification of branch instructions within cache blocks [10]. Chang et al [21] proposed an energy-efficient BTB lookup scheme for embedded processors, emphasizing the filtering of sequential execution instructions to decrease BTB access.…”
Section: Power Optimization and Capacitymentioning
confidence: 99%
See 2 more Smart Citations
“…With the increasing demands on BTB capacity as program loads grow, reducing power consumption in BTB design is becoming increasingly vital. Kaynak et al proposed the AirBTB, a basic block BTB design that employs a branch bitmap for each branch table entry, simplifying the identification of branch instructions within cache blocks [10]. Chang et al [21] proposed an energy-efficient BTB lookup scheme for embedded processors, emphasizing the filtering of sequential execution instructions to decrease BTB access.…”
Section: Power Optimization and Capacitymentioning
confidence: 99%
“…Convention BTB 100% 0% Air-BTB [10] 26.4-61.5% 26.4-61.5% TG-BTB [12] 10.75-22.5% 67.55-89.25% FIL-BTB [26] 10-30% 70-90% The proposed scheme 3-14% 86-97%…”
Section: Schemes Power Consumption Decrease Power Consumptionmentioning
confidence: 99%
See 1 more Smart Citation
“…While highly effective, each prefetcher requires hundreds of kilobytes of metadata storage per core. Recent temporal streaming research has focused on lowering the storage costs [8,13,14]; however, even with optimizations, for a many-core CMP running several consolidated workloads, the total storage requirements can reach into megabytes.…”
Section: Introductionmentioning
confidence: 99%
“…Meanwhile, conditional branches are maintained in a separate smallcapacity BTB. By exploiting prior observations on control flow commonality in instruction and BTB working sets [14],…”
Section: Introductionmentioning
confidence: 99%