Luc Bougé scite author profile

As data volumes increase at a high speed in more and more application fields of science, engineering, information services, etc., the challenges posed by data-intensive computing gain an increasing importance. The emergence of highly scalable infrastructures, e.g. for cloud computing and for petascale computing and beyond introduces additional issues for which scalable data management becomes an immediate need. This paper brings several contributions. First, it proposes a set of principles for designing highly scalable distributed storage systems that are optimized for heavy data access concurrency. In particular, we highlight the potentially large benefits of using versioning in this context. Second, based on these principles, we propose a set of versioning algorithms, both for data and metadata, that enable a high throughput under concurrency. Finally, we implement and evaluate these algorithms in the BlobSeer prototype, that we integrate as a storage backend in the Hadoop MapReduce framework. We perform extensive microbenchmarks as well as experiments with real MapReduce applications: they demonstrate that applying the principles defended in our approach brings substantial benefits to data intensive applications.

show abstract

On the existence of symmetric algorithms to find leaders in networks of communicating sequential processes

Bougé

1988

Acta Informatica

View full text Add to dashboard Cite

Test sets generation from algebraic specifications using logic programming

Bougé

Choquet

Fribourg

et al. 1986

Journal of Systems and Software

View full text Add to dashboard Cite

A performance evaluation of Azure and Nimbus clouds for scientific applications

Tudoran

Costan

Antoniu

et al. 2012

View full text Add to dashboard Cite

The emergence of cloud computing brought the opportunity to use large-scale computational infrastructures for a broad spectrum of scientific applications. As more and more cloud providers and technologies appear, scientists are faced with an increasingly difficult problem of evaluating various offerings, like public and private clouds, and deciding which model to use for their applications' needs. In this paper, we make a performance evaluation of two public and private cloud platforms for scientific computing workloads. We compare the Azure and Nimbus clouds, considering all the primary needs of scientific applications (computation power, storage, data transfers and costs). The evaluation is done using both synthetic benchmarks and a real-life application. Our results show that Nimbus incurs less varaibility and has increased support for data intensive applications, while Azure deploys faster and has a lower cost.

show abstract

The Hyperion system: Compiling multithreaded Java bytecode for distributed execution

et al. 2001

View full text Add to dashboard Cite

A preliminary version of this work has been presented as a Distinguished Paper at the Euro-Par 2000 Conference, Munich, Germany, August 2000.International audienceOur work combines Java compilation to native code with a runtime library that executes Java threads in a distributed memory environment. This allows a Java programmer to view a cluster of processors as executing a single JAVA virtual machine. The separate processors are simply resources for executing Java threads with true parallelism, and the run-time system provides the illusion of a shared memory on top of the private memories of the processors. The environment we present is available on top of several UNIX systems and can use a large variety of communication interfaces thanks to the high portability of its run time system. To evaluate our approach, we compare serial C, serial Java, and multithreaded Java implementations of a branch and-bound solution to the minimal-cost map-coloring problem. All measurements have been carried out on two platforms using two different communication interfaces: SISCI/SCI and MPI BIP/Myrinet

show abstract

An efficient and transparent thread migration scheme in the PM2 runtime system

Antoniu

Bougé

Namyst

1999

View full text Add to dashboard Cite

Bridging Data in the Clouds: An Environment-Aware System for Geographically Distributed Data Transfers

Tudoran¹,

Costan²,

Wang³

et al. 2014

View full text Add to dashboard Cite

Abstract-Today's continuously growing cloud infrastructures provide support for processing ever increasing amounts of scientific data. Cloud resources for computation and storage are spread among globally distributed datacenters. Thus, to leverage the full computation power of the clouds, global data processing across multiple sites has to be fully enabled. However, managing data across geographically distributed datacenters is not trivial as it involves high and variable latencies among sites which come at a high monetary cost. In this work, we propose a uniform data management system for scientific applications running across geographically distributed sites. Our solution is environmentaware, as it monitors and models the global cloud infrastructure, and offers predictable data handling performance for transfer cost and time. In terms of efficiency, it provides the applications with the possibility to set a tradeoff between money and time and optimizes the transfer strategy accordingly. The system was validated on Microsoft's Azure cloud across the 6 EU and US datacenters. The experiments were conducted on hundreds of nodes using both synthetic benchmarks and the real life A-Brain application. The results show that our system is able to model and predict well the cloud performance and to leverage this into efficient data dissemination. Our approach reduces the monetary costs and transfer time by up to 3 times.

show abstract

A compositional approach to superimposition

Bougé

Francez

1988

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Luc Bougé

BlobSeer: Next-generation data management for large scale infrastructures

On the existence of symmetric algorithms to find leaders in networks of communicating sequential processes

Test sets generation from algebraic specifications using logic programming

A performance evaluation of Azure and Nimbus clouds for scientific applications

The Hyperion system: Compiling multithreaded Java bytecode for distributed execution

An efficient and transparent thread migration scheme in the PM2 runtime system

Bridging Data in the Clouds: An Environment-Aware System for Geographically Distributed Data Transfers

A compositional approach to superimposition

Contact Info

Product

Resources

About