Cho‐Li Wang scite author profile

Cho‐Li Wang

4Publications

129Citation Statements Received

101Citation Statements Given

How they've been cited

229

129

How they cite others

101

Affiliations

University of Hong Kong, Chinese University of Hong Kong, Hong Kong University of Science and Technology

Publications

Order By: Most citations

Optimization of cloud task processing with checkpoint-restart mechanism

Robert

Vivien

et al. 2013

View full text Add to dashboard Cite

In this paper, we aim at optimizing fault-tolerance techniques based on a checkpointing/restart mechanism, in the context of cloud computing. Our contribution is three-fold.(1) We derive a fresh formula to compute the optimal number of checkpoints for cloud jobs with varied distributions of failure events. Our analysis is not only generic with no assumption on failure probability distribution, but attractively simple to apply in practice. (2) We design an adaptive algorithm to optimize the checkpointing effect regarding various costs like checkpointing/restart overhead. (3) We evaluate our optimized solution in a real cluster environment with hundreds of virtual machines and Berkeley Lab Checkpoint/Restart tool. Task failure events are emulated via a production trace produced on a large-scale Google data center. Experiments confirm that our solution is fairly suitable for Google systems. Our optimized formula outperforms Young's formula by 3-10 percent, reducing wall-clock lengths by 50-100 seconds per job on average.

show abstract

Dynamic Optimization of Multiattribute Resource Allocation in Self-Organizing Clouds

Wang

2013

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Abstract-By leveraging virtual machine (VM) technology which provides performance and fault isolation, Cloud resources can be provisioned on demand in a fine-grained, multiplexed manner rather than in monolithic pieces. By integrating volunteer computing into Cloud architectures, we envision a gigantic Self-Organizing Cloud (SOC) being formed to reap the huge potential of untapped commodity computing power over the Internet. Towards this new architecture where each participant may autonomously act as both resource consumer and provider, we propose a fully distributed, VM-multiplexing resource allocation scheme to manage decentralized resources. Our approach not only achieves maximized resource utilization using the proportional share model (PSM), but also delivers provably and adaptively optimal execution efficiency. We also design a novel multi-attribute range query protocol for locating qualified nodes. Contrary to existing solutions which often generate bulky messages per request, our protocol produces only one lightweight query message per task on the Content Addressable Network (CAN). It works effectively to find for each task its qualified resources under a randomized policy that mitigates the contention among requesters. We show the SOC with our optimized algorithms can make an improvement by 15%-60% in system throughput than a P2P Grid model. Our solution also exhibits fairly high adaptability in a dynamic node-churning environment.

show abstract

Designing SSI clusters with hierarchical checkpointing and single I/O space

Hwang

Jin

Chow³

et al. 1999

IEEE Concurrency

View full text Add to dashboard Cite

In a cluster of computers, local area networks or highbandwidth switch networks using optical fibers physically connect a collection of node computers. The workstations in a cluster can work collectively as an integrated computing resource-that is, an SSI-or they can operate as individual computers, separately. Present clusters are usually small and provide only limited SSI services. Future clusters will likely increase in scalability and offer more SSI support, as Figure 1 illustrates. The implication is that future clusters could replace the MPP, SMP, or CC-NUMA architectures (see "The cluster as a computer architecture" sidebar for key characteristics of these computer platforms). We focus on clusters with high availability through SSI support, distributed RAID (redundant arrays of inexpensive disks) with parity checks, and hierarchical checkpointing with adaptive recovery. In particular, we developed a single I/O address space among all disks and peripheral devices attached in the cluster. This enables direct remote disk access, which is a necessary step to implement a Adopting a new hierarchical checkpointing architecture, the authors develop a single I/O address space for building highly available clusters of computers. They propose a systematic approach to achieving single system image by integrating existing middleware support with the newly developed features. Cluster Computing T he computing trend is moving from clustering high-end mainframes to clustering desktop computers. This trend is triggered by the widespread use of PCs, workstations, gigabit networks, and middleware support for clustering. 1 This article presents new approaches to achieving fault tolerance and single system image

show abstract

JESSICA: Java-Enabled Single-System-Image Computing Architecture

J.M.¹,

Wang²,

Lau³

2000

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Cho‐Li Wang

Optimization of cloud task processing with checkpoint-restart mechanism

Dynamic Optimization of Multiattribute Resource Allocation in Self-Organizing Clouds

Designing SSI clusters with hierarchical checkpointing and single I/O space

JESSICA: Java-Enabled Single-System-Image Computing Architecture

Contact Info

Product

Resources

About