The Grid Datafarm (Gfarm) architecture is designed for global petascale data-intensive computing. It provides a global parallel filesystem with online petascale storage, scalable I/O bandwidth, and scalable parallel processing, and it can exploit local I/O in a grid of clusters with tens of thousands of nodes. Gfarm parallel I/O APIs and commands provide a single filesystem image and manipulate filesystem metadata consistently. Fault tolerance and load balancing are automatically managed by file duplication or re-computation using a command history log. Preliminary performance evaluation has shown scalable disk I/O and network bandwidth on 64 nodes of the Presto III Athlon cluster. The Gfarm parallel I/O write and read operations has achieved data transfer rates of 1.74 GB/s and 1.97 GB/s, respectively, using 64 cluster nodes. The Gfarm parallel file copy reached 443 MB/s with 23 parallel streams on the Myrinet 2000. The Gfarm architecture is expected to enable petascale data-intensive Grid computing with an I/O bandwidth scales to the TB/s range and scalable computational power.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.