Hitoshi Murai scite author profile

The present paper introduces the XcalableACC (XACC) programming model, which is a hybrid model of the XcalableMP (XMP) Partitioned Global Address Space (PGAS) language and OpenACC. XACC defines directives that enable programmers to mix XMP and OpenACC directives in order to develop applications that can use accelerator clusters with ease. Moreover, in order to improve the performance of stencil applications, the Omni XACC compiler provides functions that can transfer a halo region on accelerator memory via Tightly Coupled Accelerators (TCA), which is a proprietary network for transferring data directly among accelerators. In the present paper, we evaluate the productivity and the performance of XACC through implementations of the HIMENO Benchmark. The results show that thanks to the productivity improvements, XACC requires less than half the source lines of code compare to a combination of Message Passing Interface (MPI) and OpenACC, which is commonly used together as a typical programming model. As a result of these performance improvements, XACC using TCA achieved up to 2.7 times faster performance than could be obtained via the combination of OpenACC and MPI programming model using GPUDirect RDMA over InfiniBand.

show abstract

The K computer Operations: Experiences and Statistics

Yamamoto

Uno

Murai

et al. 2014

Procedia Computer Science

View full text Add to dashboard Cite

Implementation and evaluation of the HPC challenge benchmark in the XcalableMP PGAS language

Nakao

Murai

Iwashita

et al. 2017

The International Journal of High Performance Computing Applica

View full text Add to dashboard Cite

To improve productivity for developing parallel applications on high performance computing systems, the XcalableMP PGAS language has been proposed. XcalableMP supports both a typical parallelization under the ''global-view memory model'' which uses directives and a flexible parallelization under the ''local-view memory model'' which uses coarray features. The goal of the present paper is to clarify XcalableMP's productivity and performance. To do so, we implement and evaluate the high performance computing challenge benchmark, namely, EP STREAM Triad, High Performance Linpack, Global fast Fourier transform, and RandomAccess on the K computer using up to 16,384 compute nodes and a generic cluster system using up to 128 compute nodes. We found that we could more easily implement the benchmarks using XcalableMP rather than using MPI. Moreover, most of the performance results using XcalableMP were almost the same as those using MPI.

show abstract

14.9 TFLOPS Three-Dimensional Fluid Simulation for Fusion Science with HPF on the Earth Simulator

Sakagami¹,

Murai²,

Seo³

et al. 2002

View full text Add to dashboard Cite

We succeeded in getting 14.9 TFLOPS performance when running a plasma simulation code IMPACT-3D parallelized with High Performance Fortran on 512 nodes of the Earth Simulator. The theoretical peak performance of the 512 nodes is 32 TFLOPS, which means 45% of the peak performance was obtained with HPF. IMPACT-3D is an implosion analysis code using TVD scheme, which performs three-dimensional compressible and inviscid Eulerian fluid computation with the explicit 5-point stencil scheme for spatial differentiation and the fractional time step for time integration. The mesh size is 2048x2048x4096, and the third dimension was distributed for the parallelization. The HPF system used in the evaluation is HPF/ES, developed for the Earth Simulator by enhancing NEC HPF/SX V2 mainly in communication scalability. Shift communications were manually tuned to get best performance by using HPF/JA extensions, which was designed to give the users more control over sophisticated parallelization and communication optimizations.

show abstract

Nearly degenerate wavelength-multiplexed polarization entanglement by cascaded optical nonlinearities in a PPLN ridge waveguide device

Arahira¹,

Murai²

2013

Opt. Express

View full text Add to dashboard Cite

In this paper we report the generation of wavelength-multiplexed polarization-entangled photon pairs in the 1.5-μm communication wavelength band by using cascaded optical second nonlinearities (sum-frequency generation and subsequent spontaneous parametric down-conversion, c-SFG/SPDC) in a periodically poled LiNbO(3) ridge waveguide device. The c-SFG/SPDC method makes it possible to fully use the broad spectral bandwidth of SPDC in nearly frequency-degenerate conditions, and can provide more than 50 pairs of wavelength channels for the entangled photon pairs in the 1.5-μm wavelength band, using only standard optical resources in the telecom field. Visibilities higher than 98% were clearly observed in two-photon interference fringes for all the wavelength channels under investigation (eight pairs). We further performed a detailed experimental investigation of the cross-talk characteristics and the impact of detuning the pump wavelengths.

show abstract

EA-Modulator-Based Optical Time Division Multiplexing/Demultiplexing Techniques for 160-Gb/s Optical Signal Transmission

Murai

Kagawa

Tsuji

et al. 2007

IEEE J. Select. Topics Quantum Electron.

View full text Add to dashboard Cite

Novel design scheme for high-speed MQW lasers with enhanced differential gain and reduced carrier transport effect

Matsui

Murai

Arahira

et al. 1998

IEEE J. Quantum Electron.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hitoshi Murai

30-GHz bandwidth 1.55-μm strain-compensated InGaAlAs-InGaAsP MQW laser

XcalableACC: Extension of XcalableMP PGAS Language Using OpenACC for Accelerator Clusters

The K computer Operations: Experiences and Statistics

Implementation and evaluation of the HPC challenge benchmark in the XcalableMP PGAS language

14.9 TFLOPS Three-Dimensional Fluid Simulation for Fusion Science with HPF on the Earth Simulator

Nearly degenerate wavelength-multiplexed polarization entanglement by cascaded optical nonlinearities in a PPLN ridge waveguide device

EA-Modulator-Based Optical Time Division Multiplexing/Demultiplexing Techniques for 160-Gb/s Optical Signal Transmission

Novel design scheme for high-speed MQW lasers with enhanced differential gain and reduced carrier transport effect

Contact Info

Product

Resources

About