Using MPI Derived Datatypes in Numerical Libraries

Bajrovic, Enes; Träff, Jesper Larsson

doi:10.1007/978-3-642-24449-0_6

Cited by 9 publications

(7 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More and more success stories about improving performance using MPI DDTs are reported [1,8] and tools are available that enable users to quickly change their code from using manual pack loops to leveraging derived datatypes [10].…”

Section: Related Workmentioning

confidence: 99%

MPI datatype processing using runtime compilation

Schneider

Kjølstad

Hoefler

2013

Proceedings of the 20th European MPI Users' Group Meeting

View full text Add to dashboard Cite

Data packing before and after communication can make up as much as 90% of the communication time on modern computers. Despite MPI's well-defined datatype interface for non-contiguous data access, many codes use manual pack loops for performance reasons. Programmers write accesspattern specific pack loops (e.g., do manual unrolling) for which compilers emit optimized code. In contrast, MPI implementations in use today interpret datatypes at pack time, resulting in high overheads. In this work we explore the effectiveness of using runtime compilation techniques to generate efficient and optimized pack code for MPI datatypes at commit time. Thus, none of the overhead of datatype interpretation is incurred at pack time and pack setup is as fast as calling a function pointer. We have implemented a library called libpack that can be used to compile and (un)pack MPI datatypes. The library optimizes the datatype representation and uses the LLVM framework to produce vectorized machine code for each datatype at commit time. We show several examples of how MPI datatype pack functions benefit from runtime compilation and analyze the performance of compiled pack functions for the data access patterns in many applications. We show that the pack/unpack functions generated by our packing library are seven times faster than those of prevalent MPI implementations for 73% of the datatypes used in a scientific application and in many cases outperform manual pack loops.

show abstract

Section: Related Workmentioning

confidence: 99%

MPI datatype processing using runtime compilation

Schneider

Kjølstad

Hoefler

2013

Proceedings of the 20th European MPI Users' Group Meeting

View full text Add to dashboard Cite

show abstract

“…They demonstrate an online approach using the LLVM framework to produce vectorized machine code for each datatype at commit time. They use 128-bit store (SSE2) because of platform limitations and demonstrate only for Intel X86 architecture 2 .…”

Section: Software-based Solutionsmentioning

confidence: 99%

“…In the early 2000s, researchers benchmarked derived datatypes and modified mini-applications to study the performance [12,15]; datatype performance portability had not yet been found as an issue. Later studies praised derived datatypes for helping express distributed data structures more conveniently and for providing performance advantages over the cases without derived datatypes [2,9,28]. However, the results have not completely sustained after years of technology improvement.…”

Section: Related Workmentioning

confidence: 99%

MPI Derived Datatypes

Xiong

Bangalore

Skjellum

et al. 2018

Proceedings of the 25th European MPI Users' Group Meeting

View full text Add to dashboard Cite

This paper addresses performance-portability and overall performance issues when derived datatypes are used with four MPI implementations: Open MPI, MPICH, MVAPICH2, and Intel MPI. These comparisons are particularly relevant today since most vendor implementations are now based on Open MPI or MPICH rather than on vendor proprietary code as was more prevalent in the past. Our findings are that, within a single MPI implementation, there are significant differences in performance as a function of it reasonable encodings of derived datatypes as supported by the MPI standard. While this finding may not be surprising, it is important to understand how fundamental vs. arbitrary choices made in early implementation impact the use of derived datatypes to date. A more significant finding is that one cannot reliably choose a single derived datatype format and expect uniform performance portability among these four implementations. That is, the bestperforming path under one of the MPI code bases is not the same as the best-performing path under another. Users have to be prepared to recode for a different formulation to move efficiently among MPICH, MVAPICH2, Intel MPI, and Open MPI. This lack of uniformity presents a significant gap in MPI's fundamental purpose of offering performance portability. Specific examination of internal implementation details indicates why performance is different among the implementations. Proposed solutions to this problem include i) revamping datatypes; ii) providing a common, underlying datatype standard used by multiple MPI implementations; and iii) exploring new ways to describe derived datatypes that are optimizable by modern networks and faster than MPI implementations' software-based marshaling and unmarshaling.

show abstract

“…The use of MPI datatypes to provide better performance within applications has been explored in several studies, e.g., [1,5,6]. Benchmarks for datatypes focusing on the complexity of the different constructors were defined in [8].…”

Section: Related Workmentioning

confidence: 99%

“…In addition, there are convenience functions for creating datatypes representing subarrays and distributed arrays. Another special constructor makes it possible to change the extent of a (derived) datatype, which is important when using nested type constructors, see for instance [1].…”

Section: Derived Datatype Constructorsmentioning

confidence: 99%

Performance Expectations and Guidelines for MPI Derived Datatypes

Gropp

Hoefler

Thakur

et al. 2011

Recent Advances in the Message Passing Interface

Self Cite

View full text Add to dashboard Cite

Abstract. MPI's derived datatypes provide a powerful mechanism for concisely describing arbitrary, noncontiguous layouts of user data for use in MPI communication. This paper formulates self-consistent performance guidelines for derived datatypes. Such guidelines make performance expectations for derived datatypes explicit and suggest relevant optimizations to MPI implementers. We also identify self-consistent guidelines that are too strict to enforce, because they entail NP-hard optimization problems. Enforced self-consistent guidelines assure the user that certain manual datatype optimizations cannot lead to performance improvements, which in turn contributes to performance portability between MPI implementations that behave in accordance with the guidelines. We present results of tests with several MPI implementations, which indicate that many of them violate the guidelines.

show abstract

Using MPI Derived Datatypes in Numerical Libraries

Cited by 9 publications

References 9 publications

MPI datatype processing using runtime compilation

MPI datatype processing using runtime compilation

MPI Derived Datatypes

Performance Expectations and Guidelines for MPI Derived Datatypes

Contact Info

Product

Resources

About