Abstract-This paper presents a theoretical analysis and practical evaluation of the main bottlenecks towards a scalable distributed solution for the training of Deep Neural Networks (DNNs). The presented results show, that the current state of the art approach, using data-parallelized Stochastic Gradient Descent (SGD), is quickly turning into a vastly communication bound problem. In addition, we present simple but fixed theoretic constraints, preventing effective scaling of DNN training beyond only a few dozen nodes. This leads to poor scalability of DNN training in most practical scenarios.
We introduce an algebro-geometrically motived integration-by-parts (IBP) reduction method for multi-loop and multi-scale Feynman integrals, using a framework for massively parallel computations in computer algebra. This framework combines the computer algebra system Singular with the workflow management system GPI-Space, which are being developed at the TU Kaiserslautern and the Fraunhofer Institute for Industrial Mathematics (ITWM), respectively. In our approach, the IBP relations are first trimmed by modern tools from computational algebraic geometry and then solved by sparse linear algebra and our new interpolation method. Modelled in terms of Petri nets, these steps are efficiently automatized and automatically parallelized by GPI-Space. We demonstrate the potential of our method at the nontrivial example of reducing two-loop five-point nonplanar double-pentagon integrals. We also use GPI-Space to convert the basis of IBP reductions, and discuss the possible simplification of IBP coefficients in a uniformly transcendental basis.
Reverse Time Migration (RTM) has become the standard for high-quality imaging in the seismic industry. RTM relies on PDE solutions using stencils that are 8 t h order or larger, which require large-scale HPC clusters to meet the computational demands. However, the rising power consumption of conventional cluster technology has prompted investigation of architectural alternatives that offer higher computational efficiency. In this work, we compare the performance and energy efficiency of three architectural alternatives -the Intel Nehalem X5530 multicore processor, the NVIDIA Tesla C2050 GPU, and a general-purpose manycore chip design optimized for high-order wave equations called "Green Wave." We have developed an FPGA-accelerated architectural simulation platform to accurately model the power and performance of the Green Wave design. Results show that across highly-tuned high-order RTM stencils, the Green Wave implementation can offer up to 8× and 3.5× energy efficiency improvement per node respectively, compared with the Nehalem and GPU platforms. These results point to the enormous potential energy advantages of our hardware/software co-design methodology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.