Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems - 1991
DOI: 10.1145/106972.106981
|View full text |Cite
|
Sign up to set email alerts
|

The cache performance and optimizations of blocked algorithms

Abstract: Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchies. Instead of operating on entire rows or columns of an array, blocked algorithms operate on submatrices or blocks, so that data loaded into the faster levels of the memory hierarchy are reused. This paper presents cache performance data for blocked programs and evaluates several optimization to improve this performance. The data is obtained by a theoretical model of data conflicts in the cache, which has been v… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
119
0
1

Year Published

2004
2004
2018
2018

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 605 publications
(132 citation statements)
references
References 8 publications
0
119
0
1
Order By: Relevance
“…Improving locality has also become one of the goals of algorithmic design, in the interest of better using the memory hierarchy [423,28,334]. When considered at the granularity of pages, locality lies at the basis of the working set model and of paging algorithms [171,174,172].…”
Section: Spatial and Temporal Localitymentioning
confidence: 99%
“…Improving locality has also become one of the goals of algorithmic design, in the interest of better using the memory hierarchy [423,28,334]. When considered at the granularity of pages, locality lies at the basis of the working set model and of paging algorithms [171,174,172].…”
Section: Spatial and Temporal Localitymentioning
confidence: 99%
“…A more efficient way to distribute data among parallel processors is to use blocked partitioning, that is to divide in more than one dimension the data structures [9,21]. Blocked solutions are frequently adopted in numerical operations because this strategy enables a better memory access and cache use for the applications.…”
Section: Blocked Parallel Algorithmsmentioning
confidence: 99%
“…A number of groups are attempting to improve performance through architectural innovation [18]. Other groups are attacking the problem in software: either in the compiler-level through reordering instructions and prefetching [9,10], or through complex data layouts to improve cache performance [2,16].…”
Section: Related Workmentioning
confidence: 99%
“…Nonserial polyadic dynamic programming algorithms, on the other hand, pose unique challenges to improving cache performance due to their irregular data access patterns. These challenges are significantly different from those faced in the dense linear algebra problems, which are often easily handled using standard cache-friendly optimizations such as tilling or blocking [4,10]. Optimizations such as tiling can be applied to nonserial polyadic dynamic programming only after considering these specific details individually.…”
Section: Related Workmentioning
confidence: 99%