Numerical weather prediction (NWP) experiments can be complex and time consuming; results depend on computational environments and numerous input parameters. Delays in learning and obtaining research results are inevitable. Students face disproportionate effort in the classroom or beginning graduate-level NWP research. Published NWP research is generally not reproducible, introducing uncertainty and slowing efforts that build on past results. This work exploits the rapid emergence of software container technology to produce a transformative research and education environment. The Weather Research and Forecasting (WRF) Model anchors a set of linked Linux-based containers, which include software to initialize and run the model, to analyze results, and to serve output to collaborators. The containers are demonstrated with a WRF simulation of Hurricane Sandy. The demonstration illustrates the following: 1) how the often-difficult exercise in compiling the WRF and its many dependencies is eliminated, 2) how sharing containers provides identical environments for conducting research, 3) that numerically reproducible results are easily obtainable, and 4) how uncertainty in the results can be isolated from uncertainty arising from computing system differences. Numerical experiments designed to simultaneously measure numerical reproducibility and sensitivity to compiler optimization provide guidance for interpreting NWP research. Reproducibility is independent from the operating system and hardware. Results here show numerically identical output on all computing platforms tested. Performance reproducibility is also demonstrated. The result is an infrastructure capable of accelerating classroom learning, graduate research, and collaborative science.
Evaluating experimental results in the field of computer systems is a challenging task, mainly due to the many changes in software and hardware that computational environments go through. In this position paper, we analyze salient features of container technology that, if leveraged correctly, can help reduce the complexity of reproducing experiments in systems research. We present a use case in the area of distributed storage systems to illustrate the extensions that we envision, mainly in terms of container management infrastructure. We also discuss the benefits and limitations of using containers as a way of reproducing research in other areas of experimental systems research.
Improving the performance and functionality of database system optimizers requires experimentation on real customer data. Often these data are of sensitive nature and the only way to keep them is by applying a non-reversible transformation to obfuscate them. However, in order that the database optimizer generates exactly the same query plans as for the sensitive data, the transformation has to preserve the order and some important properties of the data distribution. Unfortunately, existing data obfuscation techniques do not preserve all of these properties and therefore are not applicable in this context. In this paper we present a Desensitizer tool that we have developed for optimizer performance experiments of HP's Neoview high availability data warehousing product. The tool is based on novel numeric and string desensitization algorithms which are agnostic to the database system. We explain the core concepts behind the algorithms, how they preserve the required data properties and important implementation considerations that were made. We present the architecture of the Desensitizer tool and results of the extensive validation that we conducted.
The rise of Integrated Application Workflows (IAWs) for processing data prior to storage on persistent media prompts the need to incorporate features that reproduce many of the semantics of persistent storage devices. One such feature is the ability to manage data sets as chunks with natural barriers between different data sets. Towards that end, we need a mechanism to ensure that data moved to an intermediate storage area is both complete and correct before allowing access by other processing components. The Doubly Distributed Transactions (D 2 T) protocol offers such a mechanism. The initial development [9] suffered from scalability limitations and undue requirements on server processes. The current version has addressed these limitations and has demonstrated scalability with low overhead.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.