2010
DOI: 10.1007/s00224-010-9273-8
|View full text |Cite
|
Sign up to set email alerts
|

The Cache-Oblivious Gaussian Elimination Paradigm: Theoretical Framework, Parallelization and Experimental Evaluation

Abstract: We consider triply-nested loops of the type that occur in the standard Gaussian elimination algorithm, which we denote by GEP (or the Gaussian Elimination Paradigm). We present two related cache-oblivious methods I-GEP and C-GEP, both of which reduce the number of cache misses incurred (or I/Os performed) by the computation over that performed by standard GEP by a factor of √ M, where M is the size of the cache. Cache-oblivious I-GEP computes in-place and solves most of the known applications of GEP including … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
63
0

Year Published

2010
2010
2019
2019

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 45 publications
(63 citation statements)
references
References 25 publications
(45 reference statements)
0
63
0
Order By: Relevance
“…In most of these computations, each process also has full knowledge about its future request sequence of its current task. For instance, the computation of Gaussian elimination paradigm as discussed by Chowdhury and Ramachandran [9] has this type of behavior. Even the computation of matrix multiplication and fast Fourier transform have this type of behavior.…”
Section: Disjoint and Shared Memory Frameworkmentioning
confidence: 97%
See 1 more Smart Citation
“…In most of these computations, each process also has full knowledge about its future request sequence of its current task. For instance, the computation of Gaussian elimination paradigm as discussed by Chowdhury and Ramachandran [9] has this type of behavior. Even the computation of matrix multiplication and fast Fourier transform have this type of behavior.…”
Section: Disjoint and Shared Memory Frameworkmentioning
confidence: 97%
“…In several of such applications, processes also have perfect knowledge about the sequence of requests they plan to request in the future since they work on a well-defined computation like matrix multiplication or Gaussian elimination paradigm [9,10]. Observe that in these computations, the interleaving of requests from different processes reaching the shared cache still remains adversarial since the interleaving depends on factors like the difference in the clock period, interrupts from the operating systems, etc.…”
Section: Shared Memory Framework Descriptionmentioning
confidence: 99%
“…The Gaussian elimination paradigm of Chowdhury and Ramachandran [13] provides a cache-oblivious framework for these problems, similar to Toledo's recursive blocked LU factorization [41]. Our APSP work is orthogonal to that of Chowdhury and Ramachandran in the sense we provide distributed memory algorithms that minimize internode communication (both latency and bandwidth), while their method focuses on cacheobliviousness and multithreaded (shared memory) implementation.…”
Section: Previous Workmentioning
confidence: 99%
“…However, their analysis is limited to the hierarchical divide-and-conquer problems and a moderate level of parallelism. Chowdhury and Ramachandran [9] consider cache-complexity in both private-and shared-cache models for matrix-based computations, including all-pairs shortest paths algorithm of FloydWarshall. They also consider parallel dynamic programming algorithms in private-, shared-and multicore-cache models [10].…”
Section: A Prior Related Workmentioning
confidence: 99%