Proceedings of the Eighteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures 2006
DOI: 10.1145/1148109.1148157
|View full text |Cite
|
Sign up to set email alerts
|

The cache complexity of multithreaded cache oblivious algorithms

Abstract: We present a technique for analyzing the number of cache misses incurred by multithreaded cache oblivious algorithms on an idealized parallel machine in which each processor has a private cache. We specialize this technique to computations executed by the Cilk work-stealing scheduler on a machine with dag-consistent shared memory. We show that a multithreaded cache oblivious matrix multiplication incurs O(n 3 / √ Z + (P n) 1/3 n 2) cache misses when executed by the Cilk scheduler on a machine with P processors… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
62
1

Year Published

2010
2010
2022
2022

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 44 publications
(65 citation statements)
references
References 27 publications
0
62
1
Order By: Relevance
“…1 shows, for example, the memory hierarchy for the Xeon based Nehalem architecture, the current generation of desktop architecture from Intel. Correspondingly there has been significant recent work on parallel cache based locality [1,8,5,19,11,7,4,12,23,9,13]. The work has fallen into two main classes.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations
“…1 shows, for example, the memory hierarchy for the Xeon based Nehalem architecture, the current generation of desktop architecture from Intel. Correspondingly there has been significant recent work on parallel cache based locality [1,8,5,19,11,7,4,12,23,9,13]. The work has fallen into two main classes.…”
Section: Introductionmentioning
confidence: 99%
“…Dynamic parallelism can be further divided between approaches in which the user analyzes their algorithm in an abstract model that knows nothing about the scheduler and its interaction with the machine [1,8,19,11,9], and approaches in which the analysis requires an integrated analysis [7,12,13,15,14]. In the first class the scheduler is required to supply a general mapping from an abstract cost model to the particular machine models.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Using results from [1,18] (e.g., Theorem 2 and equation (4) of [18]) one can show that I-GEP incurs O(…”
Section: B ) Cache Misses When Executing Multithreaded I-gep On a Macmentioning
confidence: 99%