Henri E. Bal scite author profile

Abstract. In previous work we have shown that the MapReduce framework for distributed computation can be deployed for highly scalable inference over RDF graphs under the RDF Schema semantics. Unfortunately, several key optimizations that enabled the scalable RDFS inference do not generalize to the richer OWL semantics. In this paper we analyze these problems, and we propose solutions to overcome them. Our solutions allow distributed computation of the closure of an RDF graph under the OWL Horst semantics.We demonstrate the WebPIE inference engine, built on top of the Hadoop platform and deployed on a compute cluster of 64 machines. We have evaluated our approach using some real-world datasets (UniProt and LDSR, about 0.9-1.5 billion triples) and a synthetic benchmark (LUBM, up to 100 billion triples). Results show that our implementation is scalable and vastly outperforms current systems when comparing supported language expressivity, maximum data size and inference speed.

show abstract

Fast Measurement of LogP Parameters for Message Passing Platforms

Kielmann

Bal

Verstoep

2000

110

View full text Add to dashboard Cite

Profiling Energy Consumption of VMs for Green Cloud Computing

Chen

Grosso

Veldt³

et al. 2011

View full text Add to dashboard Cite

Network performance-aware collective communication for clustered wide-area systems

et al. 2001

View full text Add to dashboard Cite

An efficient implementation of Java's remote method invocation

et al. 1999

View full text Add to dashboard Cite

Java offers interesting opportunities for parallel computing. In particular, Java Remote Method Invocation provides an unusually flexible kind of Remote Pmcedtue Call. Unlike RPC, RMI supports polymorphism, which mquires the system to be able to download remote classes into a running application. Sun's RMI implementation achieves this kind of flexibility by passing around object type information and pmcessing it at run time, which causes a major run time overhead. Using Sun's JDK 1.1.4 on a Pentium Pro/Myri.uet cluster, for example, the latency for a null RMI (without parameters or a return value) is 1228 pet, which is about a factor of 40 higher than that of a user-level RPC. In this paper, we study an altemative approach for implementing RMI, based on native compilation. This approach allows for better optimization, eliminates the need for processing of type information at run time, and makes a light weight communication protocol possible. We have built a Java system based on a native compiler, which supports both compile time and run time generation of marshallers. We find that almost all of the run time overhead of RMI can be pushed to compile time. With this approach, the latency of a null RMI is reduced to 34 pet, while still supporting polymorphic RMIs (and allowing interoperability with other JVMs).

show abstract

Efficient load balancing for wide-area divide-and-conquer applications

2001

View full text Add to dashboard Cite

Divide-and-conquer programs are easily parallelized by letting the programmer annotate potential parallelism in the form of spawn and sync constructs. To achieve efficient program execution, the generated work load has to be balanced evenly among the available CPUs. For single cluster systems, Random Stealing (RS) is known to achieve optimal load balancing. However, RS is inefficient when applied to hierarchical wide-area systems where multiple clusters are connected via wide-area networks (WANs) with high latency and low bandwidth.In this paper, we experimentally compare RS with existing loadbalancing strategies that are believed to be efficient for multi-cluster systems, Random Pushing and two variants of Hierarchical Stealing. We demonstrate that, in practice, they obtain less than optimal results. We introduce a novel load-balancing algorithm, Clusteraware Random Stealing (CRS) which is highly efficient and easy to implement. CRS adapts itself to network conditions and job granularities, and does not require manually-tuned parameters. Although CRS sends more data across the WANs, it is faster than its competitors for 11 out of 12 test applications with various WAN configurations. It has at most 4% overhead in run time compared to RS on a single, large cluster, even with high wide-area latencies and low wide-area bandwidths. These strong results suggest that divideand-conquer parallelism is a useful model for writing distributed supercomputing applications on hierarchical wide-area systems.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Henri E. Bal

Cuckoo: A Computation Offloading Framework for Smartphones

Bandwidth-efficient collective communication for clustered wide area systems

OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples

Fast Measurement of LogP Parameters for Message Passing Platforms

Profiling Energy Consumption of VMs for Green Cloud Computing

Network performance-aware collective communication for clustered wide-area systems

An efficient implementation of Java's remote method invocation

Efficient load balancing for wide-area divide-and-conquer applications

Contact Info

Product

Resources

About