Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2015
DOI: 10.1145/2688500.2688514
|View full text |Cite
|
Sign up to set email alerts
|

Cache-oblivious wavefront: improving parallelism of recursive dynamic programming algorithms without losing cache-efficiency

Abstract: State-of-the-art cache-oblivious parallel algorithms for dynamic programming (DP) problems usually guarantee asymptotically optimal cache performance without any tuning of cache parameters, but they often fail to exploit the theoretically best parallelism at the same time. While these algorithms achieve cache-optimality through the use of a recursive divide-and-conquer (DAC) strategy, scheduling tasks at the granularity of task dependency introduces artificial dependencies in addition to those arising from the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 29 publications
(28 citation statements)
references
References 53 publications
0
28
0
Order By: Relevance
“…For the problems that we consider in this paper, the parallel DP algorithms were already discussed by a rich literature in the eighties and nighties (e.g., [49,51,42,58,57,72]). Later work not only considers parallelism, but also optimizes symmetric cache complexity (e.g., [46,34,36,31,20,60,77,74,75,41,73,32]). The algorithms in linear algebra that share the similar computation structures (but with different orders in the computation) are also discussed (e.g., [36,41,83,78,25,40,11,65]).…”
Section: Preliminaries and Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…For the problems that we consider in this paper, the parallel DP algorithms were already discussed by a rich literature in the eighties and nighties (e.g., [49,51,42,58,57,72]). Later work not only considers parallelism, but also optimizes symmetric cache complexity (e.g., [46,34,36,31,20,60,77,74,75,41,73,32]). The algorithms in linear algebra that share the similar computation structures (but with different orders in the computation) are also discussed (e.g., [36,41,83,78,25,40,11,65]).…”
Section: Preliminaries and Related Workmentioning
confidence: 99%
“…For other problems (GAP, RNA, protein accordion folding, knapsack), the bounds in the symmetric setting are also improved. Some previous work [75,41] achieves the linear span in several problems. We note that they assume a much stronger model to guarantee the sequential and parallel execution order, so their algorithms need specially designed schedulers [41,30].…”
Section: Preliminaries and Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…This overhead may be reduced to some extent by falsely reporting a greater size for Cholesky tasks at lower levels of recursion to force the SB scheduler to be more aggressive at load balancing. Further, Cholesky factorization could also achieve better performance through a relaxation of false dependencies introduced by expressing the algorithm in the fork-join paradigm using techniques recently introduced by Tang et al [2015]. This would reduce the depth of the algorithm to O(n/L) and remove all serial points in the DAG except the start and the end.…”
Section: Algorithmsmentioning
confidence: 99%