A new method is presented for distributing data in sparse matrix-vector multiplication. The method is two-dimensional, tries to minimize the true communication volume, and also tries to spread the computation and communication work evenly over the processors. The method starts with a recursive bipartitioning of the sparse matrix, each time splitting a rectangular matrix into two parts with a nearly equal number of nonzeros. The communication volume caused by the split is minimized. After the matrix partitioning, the input and output vectors are partitioned with the objective of minimizing the maximum communication volume per processor. Experimental results of our implementation, Mondriaan, for a set of sparse test matrices show a reduction in communication volume compared to one-dimensional methods, and in general a good balance in the communication work. Experimental timings of an actual parallel sparse matrix-vector multiplication on an SGI Origin 3800 computer show that a sufficiently large reduction in communication volume leads to savings in execution time.
BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full de®nition of BSPlib in C, motivates the design of its basic operations, and gives examples of their use. The library enables programming in two distinct styles: direct remote memory access (DRMA) using put or get operations, and bulk synchronous message passing (BSMP). Currently, implementations of BSPlib exist for a variety of modern architectures, including massively parallel computers with distributed memory, shared memory multiprocessors, and networks of workstations. BSPlib has been used in several scienti®c and industrial applications; this paper brie¯y describes applications in benchmarking, Fast Fourier Transforms (FFTs), sorting, and molecular dynamics.
A prototypical problem on which techniques for exact enumeration are tested and compared is the enumeration of self-avoiding walks. Here, we show an advance in the methodology of enumeration, making the process thousands or millions of times faster. This allowed us to enumerate self-avoiding walks on the simple cubic lattice up to a length of 36 steps.
Abstract. In this article, we introduce a cache-oblivious method for sparse matrix-vector multiplication. Our method attempts to permute the rows and columns of the input matrix using a recursive hypergraph-based sparse matrix partitioning scheme so that the resulting matrix induces cache-friendly behavior during sparse matrix-vector multiplication. Matrices are assumed to be stored in row-major format, by means of the compressed row storage (CRS) or its variants incremental CRS and zig-zag CRS. The zig-zag CRS data structure is shown to fit well with the hypergraph metric used in partitioning sparse matrices for the purpose of parallel computation. The separated block-diagonal (SBD) form is shown to be the appropriate matrix structure for cache enhancement. We have implemented a run-time cache simulation library enabling us to analyze cache behavior for arbitrary matrices and arbitrary cache properties during matrix-vector multiplication within a k-way set-associative idealized cache model. The results of these simulations are then verified by actual experiments run on various cache architectures. In all these experiments, we use the Mondriaan sparse matrix partitioner in one-dimensional mode. The savings in computation time achieved by our matrix reorderings reach up to 50 percent, in the case of a large link matrix.
Articles you may be interested inContinuous configuration time-dependent self-consistent field method for polyatomic quantum dynamical problems Validity of timedependent selfconsistentfield (TDSCF) approximations for unimolecular dynamics: A test for photodissociation of the Xe-HI cluster J. Chem. Phys. 93, 6484 (1990); 10.1063/1.458965 Timedependent selfconsistent field (TDSCF) approximation for a reaction coordinate coupled to a harmonic bath: Single and multiple configuration treatments J. Chem. Phys. 87, 5781 (1987); 10.1063/1.453501 Timedependent selfconsistent field approximation for intramolecular energy transfer. I. Formulation and application to dissociation of van der Waals moleculesThe vibrational predissociation dynamics of a collinear model of the 12 (v) He cluster is studied by numerically exact time-dependent quantum mechanics, and by the time-dependent selfconsistent field (TDSCF) approximation. The time evolution for the initial excitation levels v = 5, 11,22 is explored. Excellent agreement is found between the TDSCF and the exact evolution of the wave packet; in particular the approximation reproduces well the dephasing events in the dynamics, and the measurable predissociation lifetimes. The results are very encouraging as to the applicability of quantum TDSCF as a quantitative tool in the study of van der Waals predissociation dynamics.
The atomic structure of amorphous materials is believed to be well described by the continuous-randomnetwork model. We present an algorithm for the generation of large, high-quality continuous random networks. The algorithm is a variation of the sillium approach introduced by Wooten, Winer, and Weaire ͓Phys. Rev. Lett. 54, 1392 ͑1985͔͒. By employing local relaxation techniques, local atomic rearrangements can be tried that scale almost independently of system size. This scaling property of the algorithm paves the way for the generation of realistic device-size atomic networks.
Time dependent wave packet propagation of resonance states of ABA molecules is used to demonstrate the correlation between the directionality of the lobes of the wave functions and mode selectivity of the unimolecular decay. This correlation was inferred by Hose and Taylor. The molecule is modeled by the Thiele–Wilson coupled Morse oscillators. A near-degenerate pair of resonances with extreme motions is studied in detail: The local ‘‘bond’’ mode with lobes pointing towards the exit valleys of the potential decays about 30 times faster than the hyperspherical ‘‘restricted precession’’ mode with dominant lobe on the potential ridge. This is in close analogy to mode selectivity in the Hénon–Heiles system. The wave function propagation technique also yields detailed insight into the dissociation mechanism. Out of several choices, only a single lobe penetrates into the exit valley. For the local mode resonance vibrational predissociation starts out primarily from extended vibrationally excited diatomic configurations, A↔B(ν*=1)⋅ ⋅ ⋅A→AB(ν′=0)+A. However, the hyperspherical mode resonance prefers compressed diatomic geometry just before dissociation, AB(ν*=3)⋅ ⋅ ⋅A→AB(ν′=0)+A. The results imply some general criteria for mode selective unimolecular chemical reactions, as well as a successful numerical test of the preparation of resonance wave functions and their propagation by the Fourier method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.