Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2009
DOI: 10.1145/1504176.1504213
|View full text |Cite
|
Sign up to set email alerts
|

MPIWiz

Abstract: Message Passing Interface (MPI) is a widely used standard for managing coarse-grained concurrency on distributed computers. Debugging parallel MPI applications, however, has always been a particularly challenging task due to their high degree of concurrent execution and non-deterministic behavior. Deterministic replay is a potentially powerful technique for addressing these challenges, with existing MPI replay tools adopting either data-replay or order-replay approaches. Unfortunately, each approach has its tr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 36 publications
(2 citation statements)
references
References 36 publications
(41 reference statements)
0
2
0
Order By: Relevance
“…State-of-the-art record-and-replay tools such as ReMPI (Sato et al, 2015) target production-scale runs and prioritize scalability in terms of runtime and record size. Other record-and-replay tools target hybrid MPI + OpenMP executions (Budanur et al, 2012), MPI applications using one-sided communication (Qian et al, 2016b,a), replay of isolated subgroups of processes (Xue et al, 2009), and probabilistic replay (Park et al, 2009). In addition, tools such as NINJA (Sato et al, 2017) are used in conjunction with record-and-replay tools to improve the chances of capturing nondeterministic bugs.…”
Section: Software Solutions For Nondeterministic Executionsmentioning
confidence: 99%
“…State-of-the-art record-and-replay tools such as ReMPI (Sato et al, 2015) target production-scale runs and prioritize scalability in terms of runtime and record size. Other record-and-replay tools target hybrid MPI + OpenMP executions (Budanur et al, 2012), MPI applications using one-sided communication (Qian et al, 2016b,a), replay of isolated subgroups of processes (Xue et al, 2009), and probabilistic replay (Park et al, 2009). In addition, tools such as NINJA (Sato et al, 2017) are used in conjunction with record-and-replay tools to improve the chances of capturing nondeterministic bugs.…”
Section: Software Solutions For Nondeterministic Executionsmentioning
confidence: 99%
“…Xue et al proposed Subgroup Reproducible Replay (SRR) [54] to address not only the problem of trace size, but also provide the user with flexible replay options (i.e. replaying only those processes deemed relevant to the debugging task).…”
Section: Debugging-centric Techniquesmentioning
confidence: 99%