2017
DOI: 10.1145/3040221
|View full text |Cite
|
Sign up to set email alerts
|

Resource Oblivious Sorting on Multicores

Abstract: We present a deterministic sorting algorithm, SPMS (Sample, Partition, and Merge Sort), that interleaves the partitioning of a sample sort with merging. Sequentially, it sorts n elements in O(n log n) time cache-obliviously with an optimal number of cache misses. The parallel complexity (or critical path length) of the algorithm is O(log n · log log n), which improves on previous bounds for optimal cache oblivious sorting. The algorithm also has low false sharing costs. When scheduled by a work-stealing schedu… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
59
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 28 publications
(59 citation statements)
references
References 30 publications
0
59
0
Order By: Relevance
“…Notice that ω is a parameter of the main memory, instead of a cache parameter, so the algorithms can be aware of it. One can define resource-obliviousness [37] so that the value of ω is not exposed to the algorithms, but this is out of the scope of this paper.…”
Section: Asymmetric Cache Complexitymentioning
confidence: 99%
“…Notice that ω is a parameter of the main memory, instead of a cache parameter, so the algorithms can be aware of it. One can define resource-obliviousness [37] so that the value of ω is not exposed to the algorithms, but this is out of the scope of this paper.…”
Section: Asymmetric Cache Complexitymentioning
confidence: 99%
“…By applying the analysis in [14] with the change that the base cases (for the recursive sort and the transpose) are when the size fits in the ephemeral memory, and that the base case is done sequentially, we obtain the following theorem. It is possible that the log n term in the depth could be reduced using a sort by Cole and Ramachandran [23].…”
Section: Fault-tolerant Algorithmsmentioning
confidence: 99%
“…Although not explicitly stated, this observation appears to need the assumption that all variables have a fixed allocation, regardless of the amount of parallelism. In [17], we showed how to account for dynamically allocated variables (the issue being that block boundaries on the execution stack for a stolen task can diverge from those at the parent task), giving a bound of O(Q + S · M/B) on C(S). Frigo and Strumpen [20] considered the above set-up for computations where any fragment of size r that occurs in a parallel execution incurs O(f (r)) cache misses, for some concave function f .…”
Section: Related Workmentioning
confidence: 99%
“…Another reason for considering a general scheduler is to obtain 'oblivious' results as in sequential cache-oblivious algorithms [19], network-oblivious algorithms for distributed memory [5], and multicore-oblivious [13] and resource-oblivious [17,15] algorithms for shared memory multicores. In all of these cases the desire is to have algorithms analyzed in a machine-independent manner so that bounds hold across diverse platforms.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation