In response to a joint call from US's NSF and UK's EPSRC for applications that aim to utilize the combined computational resources of the US and UK, three computational science groups from UCL, Tufts and Brown Universities teamed up with a middleware team from NIU/Argonne to meet the challenge. Although the groups had three distinct codes and aims, the projects had the underlying common feature that they were comprised of large-scale distributed applications which required high-end networking and advanced middleware in order to be effectively deployed. For example, cross-site runs were found to be a very effective strategy to overcome the limitations of a single resource.The seamless federation of a grid-of-grids remains difficult. Even if interoperability at the middleware and software stack levels were to exist, it would not guarantee that the federated grids can be utilized for large scale distributed applications. There are important additional requirements for example, compatible and consistent usage policy, automated advanced reservations and most important of all co-scheduling. This paper outlines the scientific motivation and describes why distributed resources are critical for all three projects. It documents the challenges encountered in using a grid-of-grids and some of the solutions devised in response.
Computational grids and grid middleware offer unprecedented computational power and storage capacity, and thus, have opened the possibility of solving problems that were previously not possible on even the largest single computational resources. These opportunities notwithstanding, the development of grid applications that run efficiently remains a challenge due to the inherent heterogeneity of networks and system architectures inherent in such environments. We present grid solutions to two grand challenge problems in computational mechanics. To study the scalability of our solutions we implemented both as MPI applications and ran them on the TeraGrid using NEKTAR and MPICH-G2. We present the results of our study which demonstrate near linear scalability in both applications when run across multiple TeraGrid sites and at a scale of hundreds or processors. Grid ComputingThe National Science Foundation's TeraGrid (TG) (http://www.teragrid.org) integrates the most powerful open resources in the US, which at present amount to about 50 teraflops in processing power and 1.5 petabytes of online storage connected with 40 Gb/s network. Unlike conventional supercomputers, it offers the opportunity for potentially unlimited scalability. The key question that computational scientists are faced with, however, is how to adapt their application to such complex and heterogeneous network effectively. We are, indeed, at a crossroads in parallel scientific computing, similar to what computational scientists went through about fifteen years ago. The emergence of parallel software, (e.g., MPI and OpenMP), and also of domain decomposition algorithms and corresponding freeware, (e.g., METIS) [14], made parallel computing available to the wider scientific community and allowed first-principles simulations of turbulence at very fine scales, of blood flow in the human heart [15], and of global climate at just a few km-level resolution.On the other hand, simulations designed to capture detailed physicochemical, mechanical or biological processes have demonstrated quite different characteristics [2,4,5,17,18]. Some applications are computation intensive, requiring extremely powerful computing systems. Others are data intensive [1, 3, 16], necessitating creation or mining multi-terabyte data archives to extract scientific insight. Large-scale biological and physical simulations are extremely computation intensive, and are usually characterized by tightly-coupled computations and communications. To efficiently and effectively harness the power of grid computing, it is necessary to design and adapt applications to exploit ensembles of supercomputers and match application requirements and characteristics with grid resources.The challenges in the development of such gridenabled applications lie primarily in the high degree of system heterogeneity and dynamic behavior in architecture and performance of the Grid environment. For example, a grid may have a highly heterogeneous and unbalanced communication network, whose bandwidth and latency charac...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.