2018
DOI: 10.1145/3202663
|View full text |Cite
|
Sign up to set email alerts
|

Combining Software Cache Partitioning and Loop Tiling for Effective Shared Cache Management

Abstract: One of the biggest challenges in multicore platforms is shared cache management, especially for data-dominant applications. Two commonly used approaches for increasing shared cache utilization are cache partitioning and loop tiling. However, state-of-the-art compilers lack efficient cache partitioning and loop tiling methods for two reasons. First, cache partitioning and loop tiling are strongly coupled together, and thus addressing them separately is simply not effective. Second, cache partitioning and loop t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 49 publications
0
5
0
Order By: Relevance
“…This method is applicable to all single-core and shared cache multi-core CPUs. In this section, we explain our method for single core CPUs which is applicable to shared cache CPUs too, by using the software shared cache partitioning method given in our previous work [21]; no more than p threads can run in parallel (one on each core), where p is the number of cores (single threaded codes only).…”
Section: Proposed Methodologymentioning
confidence: 99%
See 1 more Smart Citation
“…This method is applicable to all single-core and shared cache multi-core CPUs. In this section, we explain our method for single core CPUs which is applicable to shared cache CPUs too, by using the software shared cache partitioning method given in our previous work [21]; no more than p threads can run in parallel (one on each core), where p is the number of cores (single threaded codes only).…”
Section: Proposed Methodologymentioning
confidence: 99%
“…No more than p threads run in parallel, one to each core, where p is the number of the cores. Different threads access only their assigned shared cache space and thus different thread tiles do not conflict with each other [21].…”
Section: Approximate the Number Of Memory Accesses And Arithmetical Imentioning
confidence: 99%
“…In [21], authors use an autotuning method to find the tile sizes, when the outermost loop is parallelised. In [11], loop tiling is combined with cache partitioning to improve performance in shared caches. Finally, in [22], a hybrid model is proposed by combining an analytical with an empirical model.…”
Section: Related Workmentioning
confidence: 99%
“…In [4], authors present defensive tiling, a technique to minimize cache misses in inclusion shared caches, when multiple programs run simultaneously. In [17], loop tiling is combined with cache partitioning to improve performance in shared caches.…”
Section: Related Workmentioning
confidence: 99%