This paper provides results on scaling Barrier and Allreduce to 8192 nodes on a cluster of Intel ® Xeon Phi™ processors installed at the University of Tokyo and the University of Tsukuba. We will describe the effects of OS and platform noise on the performance of these collectives, and provide ways to minimize the noise as well as isolate it to specific cores. We will provide results showing that Barrier and Allreduce scale well when noise is reduced. We were able to achieve a latency of 94 usec (7.1x speedup from baseline) or 1 rank per node Barrier and 145 usec (3.3x speedup) for Allreduce at the 16 byte (16B) message size at 4096 nodes. CCS CONCEPTS • Networks → Network performance analysis; • Software and its engineering → Scheduling;
Many wave-propagation analyses with varying geometries and material properties are expected to be useful for model optimization. Low-order unstructured finite-element methods are suitable for such analyses, as they are capable of modeling multi-material problems with complex geometries; however, the meshing and analysis cost is large. Therefore, in this paper, we developed a fast mesh-generator and analysis method. The robust mesh generator was 17.4-fold faster than a conventional mesh generator, and the predictor algorithm for dynamic implicit finite-element solvers showed a 1.69-fold increase in speed relative to conventional solvers and a 91.3% size-up efficiency on the full Oakforest-PACS system. We demonstrated the usability of the developed meshing and Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. analysis methods via a wave-propagation simulation on a 1.9 billion unstructured tetrahedral-element model using half of the K computer system (41,472 compute nodes).
KEYWORDSfinite-element method, tetrahedral mesh generation, adaptive multistep method, wave-propagation ACM Reference Format:
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.