Ching-Chen Ma scite author profile

Abstract-Finite-difference, stencil-based discretization approaches are widely used in the solution of partial differential equations describing physical phenomena. Newton-Krylov iterative methods commonly used in stencil-based solutions generate matrices that exhibit diagonal sparsity patterns.To exploit these structures on modern GPUs, we extend the standard diagonal sparse matrix representation and define new matrix and vector data types in the PETSc parallel numerical toolkit. We create tunable CUDA implementations of the operations associated with these types after identifying a number of GPU-specific optimizations and tuning parameters for these operations. We discuss our implementation of GPU autotuning capabilities in the Orio framework and present performance results for several kernels, comparing them with vendor-tuned library implementations.

Use of intuitive tools to enhance student learning and user experience

¹

,

²

,

³

et al. 2009

Most user interfaces today present system functions by use of verbal or iconic symbols on static 2D menu pages organized in a hierarchical system [1]. This is unnatural to all human beings and must be learned, thus being a barrier between the full use and understanding of computer systems. With this problem in mind we set out to build a website and collaborative application for International Children's Center (ICC) which could be used not only across languages but across ages as well. This effort was tested on a daily basis by a multinational team consisting of students from the United States and Turkey as well as children from the respective countries. This attention to usability will not only lead to an intuitive tool for the client but also teach the students in the course how to build intuitive user interfaces.

Ownership passing

Friedley

¹

,

Hoefler

²

,

Bronevetsky

³

et al. 2013

SIGPLAN Not.

5

The number of cores in multi-and many-core high-performance processors is steadily increasing. MPI, the de-facto standard for programming high-performance computing systems offers a distributed memory programming model. MPI's semantics force a copy from one process' send buffer to another process' receive buffer. This makes it difficult to achieve the same performance on modern hardware than shared memory programs which are arguably harder to maintain and debug. We propose generalizing MPI's communication model to include ownership passing, which make it possible to fully leverage the shared memory hardware of multi-and many-core CPUs to stream communicated data concurrently with the receiver's computations on it. The benefits and simplicity of message passing are retained by extending MPI with calls to send (pass) ownership of memory regions, instead of their contents, between processes. Ownership passing is achieved with a hybrid MPI implementation that runs MPI processes as threads and is mostly transparent to the user. We propose an API and a static analysis technique to transform legacy MPI codes automatically and transparently to the programmer, demonstrating that this scheme is easy to use in practice. Using the ownership passing technique, we see up to 51% communication speedups over a standard message passing implementation on state-of-the art multicore systems. Our analysis and interface will lay the groundwork for future development of MPI-aware optimizing compilers and multi-core specific optimizations, which will be key for success in current and nextgeneration computing platforms.

Working across time zones in cross-cultural student teams

¹

,

²

,

³

et al. 2009

SIGCSE Bull.

The ability to collaborate with fellow workers from different cultures on international projects is a key asset in today's job market . International projects add new dimensions to student teamwork. These types of projects give students the opportunity to participate in collaboration that is remote, cross-cultural, and linguistically challenging. This proposal examines an international term project completed by the computing students of RoseHulman Institute of Technology and Bilkent University.

Working across time zones in cross-cultural student teams

¹

,

²

,

³

et al. 2009

The ability to collaborate with fellow workers from different cultures on international projects is a key asset in today's job market . International projects add new dimensions to student teamwork. These types of projects give students the opportunity to participate in collaboration that is remote, cross-cultural, and linguistically challenging. This proposal examines an international term project completed by the computing students of RoseHulman Institute of Technology and Bilkent University.

Use of intuitive tools to enhance student learning and user experience

¹

,

²

,

³

et al. 2009

SIGCSE Bull.

Most user interfaces today present system functions by use of verbal or iconic symbols on static 2D menu pages organized in a hierarchical system [1]. This is unnatural to all human beings and must be learned, thus being a barrier between the full use and understanding of computer systems. With this problem in mind we set out to build a website and collaborative application for International Children's Center (ICC) which could be used not only across languages but across ages as well. This effort was tested on a daily basis by a multinational team consisting of students from the United States and Turkey as well as children from the respective countries. This attention to usability will not only lead to an intuitive tool for the client but also teach the students in the course how to build intuitive user interfaces.

Ownership passing

Friedley

¹

,

Hoefler

²

,

Bronevetsky

³

et al. 2013

19

The number of cores in multi-and many-core high-performance processors is steadily increasing. MPI, the de-facto standard for programming high-performance computing systems offers a distributed memory programming model. MPI's semantics force a copy from one process' send buffer to another process' receive buffer. This makes it difficult to achieve the same performance on modern hardware than shared memory programs which are arguably harder to maintain and debug. We propose generalizing MPI's communication model to include ownership passing, which make it possible to fully leverage the shared memory hardware of multi-and many-core CPUs to stream communicated data concurrently with the receiver's computations on it. The benefits and simplicity of message passing are retained by extending MPI with calls to send (pass) ownership of memory regions, instead of their contents, between processes. Ownership passing is achieved with a hybrid MPI implementation that runs MPI processes as threads and is mostly transparent to the user. We propose an API and a static analysis technique to transform legacy MPI codes automatically and transparently to the programmer, demonstrating that this scheme is easy to use in practice. Using the ownership passing technique, we see up to 51% communication speedups over a standard message passing implementation on state-of-the art multicore systems. Our analysis and interface will lay the groundwork for future development of MPI-aware optimizing compilers and multi-core specific optimizations, which will be key for success in current and nextgeneration computing platforms.