Abstract. Grids are facing the challenge of seamless integration of the grid power into everyday use. One critical component for this integration is responsiveness, the capacity to support on-demand computing and interactivity. Grid scheduling is involved at two levels in order to provide responsiveness: the policy level and the implementation level. The main contributions of this paper are as follows. First, we present a detailed analysis of the performance of the EGEE grid with respect to responsiveness. Second, we examine two user-level schedulers located between the general scheduling layer and the application layer. These are the DIANE (DIstributed ANalysis Environment) framework, a general-purpose overlay system, and a specialized, embedded scheduler for gPTM3D, an interactive medical image analysis application. Finally, we define and demonstrate a virtualization scheme, which achieves guaranteed turnaround time, schedulability analysis, and provides the basis for differentiated services. Both methods target a brokering-based system organized as a federation of batch-scheduled clusters, and an EGEE implementation is described.
Abstract. The medical community is producing and manipulating a tremendous volume of digital data for which computerized archiving, processing and analysis is needed. Grid infrastructures are promising for dealing with challenges arising in computerized medicine but the manipulation of medical data on such infrastructures faces both the problem of interconnecting medical information systems to grid middlewares and of preserving patients' privacy in a wide and distributed multiuser system. These constraints are often limiting the use of grids for manipulating sensitive medical data.This paper describes our design of a medical data management system taking advantage of the advanced gLite data management services, developed in the context of the EGEE project, to fulfill the stringent needs of the medical community. It ensures medical data protection through strict data access control, anonymization and encryption. The multi-level access control provides the flexibility needed for c 2007 Kluwer Academic Publishers. Printed in the Netherlands.GRID172.tex; 1 2 J. Montagnat, A. Frohner, D. Jouvenot, C. Pera et al implementing complex medical use-cases. Data anonymization prevents the exposure of most sensitive data to unauthorized users, and data encryption guarantees data protection even when it is stored at remote sites. Moreover, the developed prototype provides a grid Storage Resource Manager (SRM) interface to standard medical DICOM servers thereby enabling transparent access to medical data without interfering with medical practice.Keywords: Secure grid storage, gLite middleware, medical data management 1. Context ObjectivesMany scientific areas benefit from large and distributed storage capabilities provided by grid infrastructures. On top of physical storage resources, the EGEE [17] grid data management system eases the manipulation of large data volumes and provides high level functionality such as data distribution, replication and optimized access. To build a data management system that can adapt to the heterogeneous data storage resources (disk, tapes, silos...), the grid community has adopted standard interfaces to virtualize the underlying resources. In particular, gLite [23], the next generation EGEE middleware, has adopted the Storage Resource Manager (SRM) interface [29] standardized in the context of the Open Grid Forum [28]. The SRM's primary concern is to provide efficient access to large volumes of data. It provides, among other services, prefetching of data files recorded on secondary storage, management of storage space and reservation of storage resources. However, it does not provide any access control nor protection of data which severely limits its usability for applications manipulating sensitive data.In this paper, we address the problem of sensitive data management on the EGEE grid infrastructure and we introduce a data management service designed to handle medical records on grids. We first motivate our approach through an in-depth requirement analysis of data management in the medical ar...
Data Science is an emerging field of science, which requires a multidisciplinary approach and is based on the Big Data and data intensive technologies that both provide a basis for effective use of the data driven research and economy models. Modern data driven research and industry require new types of specialists that are capable to support all stages of the data lifecycle from data production and input to data processing and actionable results delivery, visualisation and reporting, which can be jointly defined as the Data Science professions family. The education and training of Data Scientists currently lacks a commonly accepted, harmonized instructional model that reflects all multidisciplinary knowledge and competences that are required from the Data Science practitioners in modern, data driven research and the digital economy. The educational model and approach should also solve different aspects of the future professionals that includes both theoretical knowledge and practical skills that must be supported by corresponding education infrastructure and educational labs environment. In modern conditions with the fast technology change and strong skills demand, the Data Science education and training should be customizable and delivered in multiple form, also providing sufficient data labs facilities for practical training. This paper discussed both aspects: building customizable Data Science curriculum for different types of learners and proposing a hybrid model for virtual labs that can combine local university facility and use cloud based Big Data and Data analytics facilities and services on demand. The proposed approach is based on using the EDISON Data Science Framework (EDSF) developed in the EU funded Project EDISON and CYCLONE cloud automation systems being developed in another EU funded project CYCLONE.
Two production models are candidates for e-science computing: grids enable hardware and software sharing; clouds propose dynamic resource provisioning (elastic computing). Organized sharing is a fundamental requirement for large scientific collaborations; responsiveness, the ability to provide good response time, is a fundamental requirement for seamless integration of the large scale computing resources into everyday use. This paper focuses on a model-free resource provisioning strategy supporting both scenarios. The provisioning problem is modeled as a continuous action-state space, multi-objective reinforcement learning problem, under realistic hypotheses; the high level goals of users, administrators, and shareholders are captured through simple utility functions. We propose an implementation of this reinforcement learning framework, including an approximation of the value function through an Echo State Network, and we validate it on a real dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.