Volunteer Computing (VC) is a paradigm that takes advantage of idle cycles from computing resources donated by volunteers and connected through the Internet to compute large-scale, loosely coupled simulations. A big challenge in VC projects is the scheduling of work-units across heterogeneous, volatile, and error-prone computers. The design of efficient scheduling policies for VC projects involves subjective and time-demanding tuning that is driven by knowledge of the project designer. VC projects are in need of a faster and project-independent method to automate the scheduling design. To automatically generate a scheduling policy, we must explore the extremely large space of syntactically valid policies. Given the size of this search space, exhaustive search is not feasible. Thus in this paper we propose to solve the problem using an evolutionary method to automatically generate a set of scheduling policies that are project-independent, minimize errors, and maximize throughput in VC projects. Our method includes a genetic algorithm where the representation of individuals, the fitness function, and the genetic operators are specifically tailored to get effective policies in a short time. The effectiveness of our method is evaluated with SimBA, a Simulator of BOINC Applications. In contrast with manually designed scheduling policies that often perform well only for the specific project they were designed for and require months of tuning, our resulting scheduling policies provide better overall throughput across the different VC projects considered in this work and were generated by our method in a time window of one week.
As ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation, their secondary structures have been the focus of many recent studies. Despite the computing power of supercomputers, computationally predicting secondary structures with thermodynamic methods is still not feasible when the RNA molecules have long nucleotide sequences and include complex motifs such as pseudoknots. This paper presents RNAVLab (RNA Virtual Laboratory), a virtual laboratory for studying RNA secondary structures including pseudoknots that allows scientists to address this challenge. Two important case studies show the versatility and functionalities of RNAVLab. The first study quantifies its capability to rebuild longer secondary structures from motifs found in systematically sampled nucleotide segments. The extensive sampling and predictions are made feasible in a short turnaround time because of the grid technology used. The second study shows how RNAVLab allows scientists to study the viral RNA genome replication mechanisms used by members of the virus family Nodaviridae.
The processes by which organic life is consumed and reborn in a complex ecosystem were investigated through a multiomics approach applied to the tripartite Xenorhabdus bacterium- Steinernema nematode- Galleria insect symbiosis. Trophic analyses demonstrate the primary consumers of the insect are the bacteria, and the nematode in turn consumes the bacteria.
Abstract. Soil moisture is key for understanding soil–plant–atmosphere interactions. We provide a soil moisture pattern recognition framework to increase the spatial resolution and fill gaps of the ESA-CCI (European Space Agency Climate Change Initiative v4.5) soil moisture dataset, which contains > 40 years of satellite soil moisture global grids with a spatial resolution of ∼ 27 km. We use terrain parameters coupled with bioclimatic and soil type information to predict finer-grained (i.e., downscaled) satellite soil moisture. We assess the impact of terrain parameters on the prediction accuracy by cross-validating downscaled soil moisture with and without the support of bioclimatic and soil type information. The outcome is a dataset of gap-free global mean annual soil moisture predictions and associated prediction variances for 28 years (1991–2018) across 15 km grids. We use independent in situ records from the International Soil Moisture Network (ISMN, 987 stations) and in situ precipitation records (171 additional stations) only for evaluating the new dataset. Cross-validated correlation between observed and predicted soil moisture values varies from r= 0.69 to r= 0.87 with root mean squared errors (RMSEs, m3 m−3) around 0.03 and 0.04. Our soil moisture predictions improve (a) the correlation with the ISMN (when compared with the original ESA-CCI dataset) from r= 0.30 (RMSE = 0.09, unbiased RMSE (ubRMSE) = 0.37) to r= 0.66 (RMSE = 0.05, ubRMSE = 0.18) and (b) the correlation with local precipitation records across boreal (from r= < 0.3 up to r= 0.49) or tropical areas (from r= < 0.3 to r= 0.46) which are currently poorly represented in the ISMN. Temporal trends show a decline of global annual soil moisture using (a) data from the ISMN (-1.5[-1.8,-1.24] %), (b) associated locations from the original ESA-CCI dataset (-0.87[-1.54,-0.17] %), (c) associated locations from predictions based on terrain parameters (-0.85[-1.01,-0.49] %), and (d) associated locations from predictions including bioclimatic and soil type information (-0.68[-0.91,-0.45] %). We provide a new soil moisture dataset that has no gaps and higher granularity together with validation methods and a modeling approach that can be applied worldwide (Guevara et al., 2020, https://doi.org/10.4211/hs.9f981ae4e68b4f529cdd7a5c9013e27e).
To trust findings in computational science, scientists need workflows that trace the data provenance and support results explainability. As workflows become more complex, tracing data provenance and explaining results become harder to achieve. In this paper, we propose a computational environment that automatically creates a workflow execution's record trail and invisibly attaches it to the workflow's output, enabling data traceability and results explainability. Our solution transforms existing container technology, includes tools for automatically annotating provenance metadata, and allows effective movement of data and metadata across the workflow execution. We demonstrate the capabilities of our environment with the study of SOMOSPIE, an earth science workflow. Through a suite of machine learning modeling techniques, this workflow predicts soil moisture values from the 27 km resolution satellite data down to higher resolutions necessary for policy making and precision agriculture. By running the workflow in our environment, we can identify the causes of different accuracy measurements for predicted soil moisture values in different resolutions of the input data and link different results to different machine learning methods used during the soil moisture downscaling, all without requiring scientists to know aspects of workflow design and implementation.
Neural networks (NN) are used in high-performance computing and high-throughput analysis to extract knowledge from datasets. Neural architecture search (NAS) automates NN design by generating, training, and analyzing thousands of NNs. However, NAS requires massive computational power for NN training. To address challenges of efficiency and scalability, we propose PENGUIN, a decoupled fitness prediction engine that informs the search without interfering in it. PENGUIN uses parametric modeling to predict fitness of NNs. Existing NAS methods and parametric modeling functions can be plugged into PENGUIN to build flexible NAS workflows. Through this decoupling and flexible parametric modeling, PENGUIN reduces training costs: it predicts the fitness of NNs, enabling NAS to terminate training NNs early. Early termination increases the number of NNs that fixed compute resources can evaluate, thus giving NAS additional opportunity to find better NNs. We assess the effectiveness of our engine on 6,000 NNs across three diverse benchmark datasets and three state of the art NAS implementations using the Summit supercomputer. Augmenting these NAS implementations with PENGUIN can increase throughput by a factor of 1.6 to 7.1. Furthermore, walltime tests indicate that PENGUIN can reduce training time by a factor of 2.5 to 5.3.
Abstract. Soil moisture is key for quantifying soil-atmosphere interactions and the ESA-CCI (European Space Agency Climate Change Initiative) provides historical (> 30 years) satellite soil moisture gridded data at the global scale. We evaluate an alternative approach to increase the spatial resolution of the original ESA-CCI soil moisture measurements from 27 km to 15 km grids by coupling machine learning (ML) algorithms with information from digital terrain analysis at the global scale. We modeled mean annual ESA-CCI soil moisture values across 26 years of available data (1991–2016) using a ML based kernel method and multiple terrain parameters (e.g., slope, wetness index) as prediction factors. We used ground information from the International Soil Moisture Network (ISMN, n = 13376) for evaluating soil moisture predictions. We provide gap-free mean annual soil moisture predictions, which increase by nearly 50 % the spatial resolution of ESA-CCI soil moisture product. Our predictions showed a statistical accuracy varying 0.69–0.87 % and 0.04 m3/m3 of cross-validated explained variance and root mean squared error (RMSE). We found no significant differences between the ESA-CCI and our predictions, but we found discrepancy between multiple evaluation metrics (e.g., bias vs efficiency) comparing the ESA-CCI with the ISMN. We found a negative bias (−0.01 to −0.08 m3/m3) between the values of ISMN when comparing with the ESA-CCI and our predictions across the analyzed years. A temporal analysis, using a robust trend detection strategy (i.e., Theil-Sen estimator), suggests a decline of soil moisture at the global scale that is consistent in both gridded datasets and field measurements of soil moisture varying from −0.7[−0.77, −0.62] % in the ESA-CCI product, −0.9[−1.01, −0.8] % in the downscaled predictions, and −1.6 [−1.7, −1.5] % in the ISMN. The soil moisture predictions provided here (http://www.hydroshare.org/resource/b940b704429244a99f902ff7cb30a31f) could be useful for quantifying soil moisture spatial and temporal dynamics across areas with low availability of soil moisture information in the original ESA-CCI database.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.