Jalapeño is a virtual machine for Java TM servers written in the Java language. To be able to address the requirements of servers (performance and scalability in particular), Jalapeño was designed "from scratch" to be as self-sufficient as possible. Jalapeño's unique object model and memory layout allows a hardware null-pointer check as well as fast access to array elements, fields, and methods. Run-time services conventionally provided in native code are implemented primarily in Java. Java threads are multiplexed by virtual processors (implemented as operating system threads). A family of concurrent object allocators and parallel type-accurate garbage collectors is supported. Jalapeño's interoperable compilers enable quasi-preemptive thread switching and precise location of object references. Jalapeño's dynamic optimizing compiler is designed to obtain high quality code for methods that are observed to be frequently executed or computationally intensive.
Threads and concurrency constructs in Java introduce nondeterminism to a program's execution, which makes it hard to understand and analyze the execution behavior. Nondeterminism in execution behavior also makes it impossible to use execution replay for debugging, performance monitoring, or visualization.This paper discusses a record/replay tool for Java, DejaVu, that provides deterministic replay of a program's execution. In particular, this paper describes the idea of the logical thread schedule, which makes DejaVu efficient and independent of the underlying thread scheduler. The paper also discusses how to handle the various Java synchronization operations for record and replay. DejaVu has been implemented by modifying the Sun Microsystems' Java Virtual Machine.
The Jalapeiio Dynamic Optimizing Compiler is a key component of the Jalapeiio Virtual Machine, a new Java' Virtual Machine (JVM) designed to support efficient and scalable execution of Java applications on SMP server machines. This paper describes the design of the Jalapefio Optimizing Compiler, and the implementation results that we have obtained thus far. To the best of our knowledge, this is the first dynamic optimizing compiler for Java that is being used in a JVM with a compile-only approach to program execution.
This paper presents a framework, based on global array data-ow analysis, to reduce communication costs in a program being compiled for a distributed memory machine. We introduce available section descriptor, a novel representation of communication involving array sections. This representation allows us to apply techniques for partial redundancy elimination to obtain powerful communication optimizations. With a single framework, we are able to capture optimizations like (i) vectorizing communication, (ii) eliminating communication that is redundant on any control ow path, (iii) reducing the amount of data being communicated, (iv) reducing the number of processors to which data must be communicated, and (v) moving communication earlier to hide latency, and to subsume previous communication. We show that the bidirectional problem of eliminating partial redundancies can be decomposed into simpler unidirectional problems even in the context of an array section representation, which makes the analysis procedure more e cient. We present results from a preliminary implementation of this framework, which are extremely encouraging, and demonstrate the e ectiveness of this analysis in improving the performance of programs.Distributed memory architectures are becoming increasingly popular as a viable and cost-e ective method of building massively parallel computers. However, the absence of global address space, and consequently, the need for explicit message passing among processes makes these machines very di cult to program. This has motivated the design of languages like High Performance Fortran 10], which allow the programmer to write sequential or shared-memory parallel programs that are annotated with directives specifying data decomposition. The compilers for these languages are responsible for partitioning the computation, and generating the communication necessary to fetch values of non-local data referenced by a processor. A number of such prototype compilers have been developed 18,33,23,26,22,25,3,15,28].Since the cost of interprocessor communication is usually orders of magnitude higher than the cost of accessing local data, it is extremely important for the compilers to optimize communication. The most common optimizations include message vectorization 18, 33], using collective communication 14, 23], and overlapping communication with computation 18]. However, most compilers perform little global analysis of the communication requirements across di erent loop nests. This precludes general optimizations, such as redundant communication elimination, or carrying out extra communication inside one loop nest if it subsumes communication required in the next loop nest.This paper presents a framework, based on global array data-ow analysis, to reduce communication in a program. We apply techniques for partial redundancy elimination, discussed in the context of eliminating redundant computation by Morel and Renvoise 24], and later re ned by other researchers 8,20,9]. The conventional approach to data-ow analysis regards...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.