Cache Partitioning + Loop Tiling: A Methodology for Effective Shared Cache Management

Kelefouras, Vasilios; Κεραμίδας, Γεώργιος; Voros, Nikolaos S.

doi:10.1109/isvlsi.2017.89

Cited by 5 publications

(6 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…is method is applicable to all modern single-core and shared cache multi-core CPUs. Regarding shared cache processors, we use the so ware shared cache partitioning method given in our previous work [8]. No more than p threads can run in parallel (one to each core), where p is the number of the processing cores (single threaded codes only).…”

Section: Proposed Methodologymentioning

confidence: 99%

“…T pe1 L2acc. = arra size × ti + o f f set (8) where arra size is the size of the array and o f f set gives the number of L2 accesses of the new loop kernel added in the case the data array layout is transformed. t i gives how many times the corresponding array is accessed from L2 memory and is given by Eq.…”

Section: Couple Execution Behaviour To Co Processor Architecture and Imentioning

confidence: 99%

“…In the case that the target metric is not ET or E, but the minimum number of L i memory accesses, then Algorithm 1 is changed accordingly, i.e., steps (1,2,5,8), (1,3,5,8) or (1,4,5,8) are executed only, respectively. It is important to note that in this case the number of di erent schedules that have to be further processed by Subsection 3.2 is smaller, i.e., the lower bound values of Eq.…”

Section: Putting It All Togethermentioning

confidence: 99%

See 2 more Smart Citations

A methodology for efficient code optimizations and memory management

Kelefouras

Djemame

2018

Proceedings of the 15th ACM International Conference on Computing Frontiers

Self Cite

View full text Add to dashboard Cite

e key to optimizing so ware is the correct choice, order as well parameters of optimizations-transformations, which has remained an open problem in compilation research for decades for various reasons. First, most of the compilation subproblems-transformations are interdependent and thus addressing them separately is not e ective. Second, it is very hard to couple the transformation parameters to the processor architecture (e.g., cache size and associativity) and algorithm characteristics (e.g. data reuse); therefore compiler designers and researchers either do not take them into account at all or do it partly. ird, the search space (all di erent transformation parameters) is very large and thus searching is impractical. In this paper, the above problems are addressed for data dominant a ne loop kernels, delivering signi cant contributions. A novel methodology is presented that takes as input the underlying architecture details and algorithm characteristics and outputs the near-optimum parameters of six code optimizations in terms of either L1,L2,DDR accesses, execution time or energy consumption. e proposed methodology has been evaluated to both embedded and general purpose processors and for 6 well known algorithms, achieving high speedup as well energy consumption gain values over gcc compiler, hand wri en optimized code and Polly.

show abstract

Section: Proposed Methodologymentioning

confidence: 99%

Section: Couple Execution Behaviour To Co Processor Architecture and Imentioning

confidence: 99%

Section: Putting It All Togethermentioning

confidence: 99%

See 1 more Smart Citation

A methodology for efficient code optimizations and memory management

Kelefouras

Djemame

2018

Proceedings of the 15th ACM International Conference on Computing Frontiers

Self Cite

View full text Add to dashboard Cite

show abstract

“…Therefore, a widespread literature survey is introduced on proper utilization of storage sub-systems and energy aware scheduling algorithms and their link with in a multi-core heterogeneous cloud computing environment. In [11], an algorithm for the efficient management of shared caches and their effective partitioning is presented to reduce the accessing of main memory in cloud computing environment. This technique helps to minimize the arithmetic and addressing operations.…”

Section: Related Workmentioning

confidence: 99%

“…Various researchers have introduced different cache memory Optimization techniques in above literatures. However, very few methods can be utilized in real-time due to various problems like high overhead, high energy consumption, slower performance and unable to reduce cache memory [11,12,14,[17][18][19]. Thus, we have adopted a Cache Optimization Cloud Scheduling ( ) Algorithm Based on Last Level Caches to ensure high cache memory Optimization and to enhance the processing speed of I/O subsystem in a cloud computing environment based on Dynamic voltage and Frequency Scaling ( ) technique.…”

Section: Related Workmentioning

confidence: 99%

Cache optimization cloud scheduling (COCS) algorithm based on last level caches

Kumar¹,

Ranvijay²

2019

IJAAS

View full text Add to dashboard Cite

<p><span>Recently, the utilization of cloud services like storage, various software, networking resources has extremely enhanced due to widespread demand of these cloud services all over the world. On the other hand, it requires huge amount of storage and resource management to accurately cope up with ever-increasing demand. The high demand of these cloud services can lead to high amount of energy consumption in these cloud centers. Therefore, to eliminate these drawbacks and improve energy consumption and storage enhancement in real time for cloud computing devices, we have presented Cache Optimization Cloud Scheduling (COCS) Algorithm Based on Last Level Caches to ensure high cache memory Optimization and to enhance the processing speed of I/O subsystem in a cloud computing environment which rely upon Dynamic Voltage and Frequency Scaling (DVFS). The proposed COCS technique helps to reduce last level cache failures and the latencies of average memory in cloud computing multi-processor devices. This proposed COCS technique provides an efficient mathematical modelling to minimize energy consumption. We have tested our experiment on Cybershake scientific dataset and the experimental results are compared with different conventional techniques in terms of time taken to accomplish task, power consumed in the VMs and average power required to handle tasks.</span></p>

show abstract

Workflow simulation and multi-threading aware task scheduling for heterogeneous computing

Kelefouras¹,

Djemame²

2022

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Cache Partitioning + Loop Tiling: A Methodology for Effective Shared Cache Management

Cited by 5 publications

References 11 publications

A methodology for efficient code optimizations and memory management

A methodology for efficient code optimizations and memory management

Cache optimization cloud scheduling (COCS) algorithm based on last level caches

Workflow simulation and multi-threading aware task scheduling for heterogeneous computing

Contact Info

Product

Resources

About