2012
DOI: 10.1002/cpe.2974
|View full text |Cite
|
Sign up to set email alerts
|

Cache‐oblivious matrix algorithms in the age of multicores and many cores

Abstract: SummaryThis article highlights the issue of upcoming wider single‐instruction, multiple‐data units as well as steadily increasing core counts on contemporary and future processor architectures. We present the recent port to and latest results of cache‐oblivious algorithms and implementations of our TifaMMy code on four architectures: SGI's UltraViolet distributed shared‐memory machine, Intel's latest x86 architecture code‐named Sandy Bridge, AMD's new Bulldozer architecture, and Intel's future Many Integrated … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
2
2
2

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…Our algorithm FUR-Hilbert, as discussed in Section 5, is implemented in C++ and compiled with gcc version 4.9.2. We compare our algorithm to the algorithm "TifaMMy" for matrix multiplication based on the Peano Curve introduced by Bader et al [11], [12] (source code has been obtained by the authors compiled with icc version 16.0.3). Furthermore we compare our algorithm to the specifically for Intel processors hardware-and hand-optimized Intel MKL library (https://software.intel.com/en-us/intel-mkl) version 11.3 (operation: dgemm).…”
Section: Matrix Multiplicationmentioning
confidence: 99%
See 2 more Smart Citations
“…Our algorithm FUR-Hilbert, as discussed in Section 5, is implemented in C++ and compiled with gcc version 4.9.2. We compare our algorithm to the algorithm "TifaMMy" for matrix multiplication based on the Peano Curve introduced by Bader et al [11], [12] (source code has been obtained by the authors compiled with icc version 16.0.3). Furthermore we compare our algorithm to the specifically for Intel processors hardware-and hand-optimized Intel MKL library (https://software.intel.com/en-us/intel-mkl) version 11.3 (operation: dgemm).…”
Section: Matrix Multiplicationmentioning
confidence: 99%
“…The cache hit rate for "perf" is calculated as: 1 − cache-misses:u cache-references:u . We use the maximum number of threads (12) for the variation in problem size and matrices of size 10 000 are processed for the variation of threads. Figure 10 illustrates the cache hit rate for each cache level respectively and the cache hit rate for the entire cache hierachy.…”
Section: Cache Hierachy On Matrix Multiplicationmentioning
confidence: 99%
See 1 more Smart Citation
“…‘Cache‐oblivious Matrix Algorithms in the Age of Multi‐ and Many‐Cores’ by Alexander Heinecke and Carsten Trinitis highlights the issue of increasing vector unit width that goes along with increasing core counts on x86 processor architectures. To demonstrate this, a cache‐oblivious numerical code has been ported to and optimized on four contemporary x86 architectures representing vector unit widths from 128 to 512 bits.…”
Section: This Special Issuementioning
confidence: 99%