High-Performance Computing on the Intel® Xeon Phi™

Wang, Endong; Zhang, Qing; Shen, Bo; Zhang, Guangyong; Lu, Xiaowei; Wu, Qingxiang; Wang, Yajuan

doi:10.1007/978-3-319-06486-4

Cited by 50 publications

(30 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several libraries and parallelization methods can be employed in order to exploit all the capabilities of the Xeon Phi. 9,11,12,34 Intel's math kernel library (MKL) provides several mathematical functions that are already optimized for the Xeon Phi coprocessor. In particular, MCsquare uses the MCG59 random number generator of this library.…”

Section: F Code Implementationmentioning

confidence: 99%

“…Intermediate approaches also exist, like the recently introduced Intel Xeon Phi coprocessors. [9][10][11][12] They are available as affordable extension cards for workstations, just like GPUs. They combine the advantages of clusters, with many independent calculation units, and those of GPUs, with access to a shared memory and vectorized calculation.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Fast multipurpose Monte Carlo simulation for proton therapy using multi‐ and many‐core CPU architectures

Souris

Lee

Sterpin³

2016

Medical Physics

102

View full text Add to dashboard Cite

show abstract

Section: F Code Implementationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Fast multipurpose Monte Carlo simulation for proton therapy using multi‐ and many‐core CPU architectures

Souris

Lee

Sterpin³

2016

Medical Physics

102

View full text Add to dashboard Cite

show abstract

“…For the novel algorithms, absolute tolerances for the Lanczos iterations and the REML optimization procedure were set to 5e-5 and 1e-5, respectively. Additionally, we compared our interpreted Python 3.6 code to BOLT-LMM versions 2.1 and 2.3.3 (C++ code compiled against the Intel MKL and Boost libraries) [5,6,24,25]. We ran each algorithm twenty times per condition, trimming away the two most extreme timings in each condition.…”

Section: Numerical Experimentsmentioning

confidence: 99%

“…Novel algorithms were implemented in the Python v3.6.5 computing environment [20], using NumPy v1.14.3 and SciPy v1.1.0 compiled against the Intel Math Kernel Library v2018.0.2 [25][26][27]. Optimization was performed using SciPy's implementation of Brent's method, with convergence determined via absolute tolerance of the standardized genomic variance componentĥ 2 .…”

Section: Numerical Experimentsmentioning

confidence: 99%

Stochastic Lanczos estimation of genomic variance components for linear mixed-effects models

Border

Becker

2019

Preprint

View full text Add to dashboard Cite

Background: Linear mixed-effects models (LMM) are a leading method in conducting genome-wide association studies (GWAS) but require residual maximum likelihood (REML) estimation of variance components, which is computationally demanding. Previous work has reduced the computational burden of variance component estimation by replacing direct matrix operations with iterative and stochastic methods and by employing loose tolerances to limit the number of iterations in the REML optimization procedure. Here, we introduce two novel algorithms, stochastic Lanczos derivative-free REML (SLDF_REML) and Lanczos first-order Monte Carlo REML (L_FOMC_REML), that exploit problem structure via the principle of Krylov subspace shift-invariance to speed computation beyond existing methods. Both novel algorithms only require a single round of computation involving iterative matrix operations, after which their respective objectives can be repeatedly evaluated using vector operations. Further, in contrast to existing stochastic methods, SLDF_REML can exploit precomputed genomic relatedness matrices (GRMs), when available, to further speed computation.Results: Results of numerical experiments are congruent with theory and demonstrate that interpreted-language implementations of both algorithms match or exceed existing compiled-language software packages in speed, accuracy, and flexibility. Conclusions:Both the SLDF_REML and L_FOMC_REML algorithms outperform existing methods for REML estimation of variance components for LMM and are suitable for incorporation into existing GWAS LMM software implementations.

show abstract

“…One MKL RNG interface API call can deliver an arbitrary number of random numbers. In our program, a maximum of 64 K random numbers are delivered in one call [30]. A thread generates the required number of random numbers for each task.…”

Section: / W a It F O R A L L T A S K S T O Be F I N I S H E D / / T mentioning

confidence: 99%

Scaling Monte Carlo Tree Search on Intel Xeon Phi

Mirsoleimani

Plaat

Herik

et al. 2015

2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)

View full text Add to dashboard Cite

Abstract-Many algorithms have been parallelized successfully on the Intel Xeon Phi coprocessor, especially those with regular, balanced, and predictable data access patterns and instruction flows. Irregular and unbalanced algorithms are harder to parallelize efficiently. They are, for instance, present in artificial intelligence search algorithms such as Monte Carlo Tree Search (MCTS). In this paper we study the scaling behavior of MCTS, on a highly optimized realworld application, on real hardware. The Intel Xeon Phi allows shared memory scaling studies up to 61 cores and 244 hardware threads. We compare work-stealing (Cilk Plus and TBB) and work-sharing (FIFO scheduling) approaches. Interestingly, we find that a straightforward thread pool with a work-sharing FIFO queue shows the best performance. A crucial element for this high performance is the controlling of the grain size, an approach that we call Grain Size Controlled Parallel MCTS. Our subsequent comparing with the Xeon CPUs shows an even more comprehensible distinction in performance between different threading libraries. We achieve, to the best of our knowledge, the fastest implementation of a parallel MCTS on the 61 core Intel Xeon Phi using a real application (47 relative to a sequential run).

show abstract

High-Performance Computing on the Intel® Xeon Phi™

Cited by 50 publications

References 0 publications

Fast multipurpose Monte Carlo simulation for proton therapy using multi‐ and many‐core CPU architectures

Fast multipurpose Monte Carlo simulation for proton therapy using multi‐ and many‐core CPU architectures

Stochastic Lanczos estimation of genomic variance components for linear mixed-effects models

Scaling Monte Carlo Tree Search on Intel Xeon Phi

Contact Info

Product

Resources

About