Since benchmarks drive computer science research and industry product development, which ones we use and how we evaluate them are key questions for the community. Despite complex runtime tradeoffs due to dynamic compilation and garbage collection required for Java programs, many evaluations still use methodologies developed for C, C++, and Fortran. SPEC, the dominant purveyor of benchmarks, compounded this problem by institutionalizing these methodologies for their Java benchmark suite. This paper recommends benchmarking selection and evaluation methodologies, and introduces the DaCapo benchmarks, a set of open source, client-side Java benchmarks. We demonstrate that the complex interactions of (1) architecture, (2) compiler, (3) virtual machine, (4) memory management, and (5) application require more extensive evaluation than C, C++, and Fortran which stress (4) much less, and do not require (3). We use and introduce new value, time-series, and statistical metrics for static and dynamic properties such as code complexity, code size, heap composition, and pointer mutations. No benchmark suite is definitive, but these metrics show that DaCapo improves over SPEC Java in a variety of ways, including more complex code, richer object behaviors, and more demanding memory system requirements. This paper takes a step towards improving methodologies for choosing and evaluating benchmarks to foster innovation in system design and implementation for Java and other managed languages.
Many state-of-the-art garbage collectors are generational, collecting the young nursery objects more frequently than old objects. These collectors perform well because young objects tend to die at a higher rate than old ones. However, these collectors do not examine object lifetimes with respect to any particular program or allocation site. This paper introduces low-cost object sampling to dynamically determine lifetimes. The sampler marks an object and records its allocation site every n bytes of allocation. The collector then computes per-site nursery survival rates. Sampling degrades total performance by only 3% on average for sample rates of 256 bytes in Jikes RVM, a rate at which overall lifetime accuracy compares well with sampling every object.An adaptive collector can use this information to tune itself. For example, pretenuring decreases nursery collection work by allocating new, but long-lived, objects directly into the mature space. We introduce a dynamic pretenuring mechanism that detects long-lived allocation sites and pretenures them, given sufficient samples. To react to phase changes, it occasionally backsamples. As with previous online pretenuring, consistent performance improvements on SPECjvm98 benchmarks are difficult to attain since only two combine sufficient allocation load with high nursery survival. Our pretenuring system consistently improves one of these, 213 javac, by 2% to 9% of total time by decreasing collection time by over a factor of two. Sampling and pretenuring overheads slow down all the others. This paper thus provides an efficient sampling mechanism that accurately predicts lifetimes, but leaves open optimization policies that can exploit this information.
Applications continue to increase in size and complexity which makes debugging and program understanding more challenging. Programs written in managed languages, such as Java, C#, and Ruby, further exacerbate this challenge because they tend to encode much of their state in the heap. This paper introduces dynamic shape analysis which seeks to characterize data structures in the heap by dynamically summarizing the object pointer relationships and detecting dynamic degree metrics based on class. The analysis identifies recursive data structures, automatically discovers dynamic degree metrics, and reports errors when degree metrics are violated. Uses of dynamic shape analysis include helping programmers find data structure errors during development, generating assertions for verification with static or dynamic analysis, and detecting subtle errors in deployment. We implement dynamic shape analysis in a Java Virtual Machine (JVM). Using SPECjvm and DaCapo benchmarks, we show that most objects in the heap are part of recursive data structures that maintain strong dynamic degree metrics. We show that once dynamic shape analysis establishes degree metrics from correct executions, it can find automatically inserted errors on subsequent executions in microbenchmarks. These suggests it can be used in deployment for improving software reliability.
Evaluation methodology underpins all innovation in experimental computer science. It requires relevant workloads, appropriate experimental design, and rigorous analysis. Unfortunately, methodology is not keeping pace with the changes in our field. The rise of managed languages such as Java, C#, and Ruby in the past decade and the imminent rise of commodity multicore architectures for the next decade pose new methodological challenges that are not yet widely understood. This paper explores the consequences of our collective inattention to methodology on innovation, makes recommendations for addressing this problem in one domain, and provides guidelines for other domains. We describe benchmark suite design, experimental design, and analysis for evaluating Java applications. For example, we introduce new criteria for measuring and selecting diverse applications for a benchmark suite. We show that the complexity and nondeterminism of the Java runtime system make experimental design a first-order consideration, and we recommend mechanisms for addressing complexity and nondeterminism. Drawing on these results, we suggest how to adapt methodology more broadly. To continue to deliver innovations, our field needs to significantly increase participation in and funding for developing sound methodological foundations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.