The wide variety of scientific user communities work with data since many years and thus have already a wide variety of data infrastructures in production today. The aim of this paper is thus not to create one new general data architecture that would fail to be adopted by each and any individual user community. Instead this contribution aims to design a reference model with abstract entities that is able to federate existing concrete infrastructures under one umbrella. A reference model is an abstract framework for understanding significant entities and relationships between them and thus helps to understand existing data infrastructures when comparing them in terms of functionality, services, and boundary conditions. A derived architecture from such a reference model then can be used to create a federated architecture that builds on the existing infrastructures that could align to a major common vision. This common vision is named as 'ScienceTube' as part of this contribution that determines the high-level goal that the reference model aims to support. This paper will describe how a well-focused use case around data replication and its related activities in the EUDAT project aim to provide a first step towards this vision. Concrete stakeholder requirements arising from scientific end users such as those of the European Strategy Forum on Research Infrastructure (ESFRI) projects underpin this contribution with clear evidence that the EUDAT activities are bottom-up thus providing real solutions towards the so often only described 'high-level big data challenges'. The followed federated approach taking advantage of community and data centers (with large computational resources) further describes how data replication services enable data-intensive computing of terabytes or even petabytes of data emerging from ESFRI projects.
The Human Brain Project (HBP) (https://humanbrainproject.eu/) is a large-scale flagship project funded by the European Commission with the goal of establishing a research infrastructure for brain science. This research infrastructure is currently being realised and will be called EBRAINS (https://ebrains.eu/). The wide ranging EBRAINS services for the brain research communities require diverse access, processing and storage capabilities. As a result, it will strongly rely on e-infrastructure services. The HBP led to the creation of Fenix (https://fenix-ri.eu/), a collaboration of five European supercomputing centres, who are providing a set of federated e-infrastructure services to EBRAINS. The Fenix architecture has been designed to uniquely address the need for a wide spectrum of services, from high performance computing (HPC) to on-demand cloud technologies to identity and access federation, for facilitating ease of access and usage of distributed e-infrastructure resources. In this article we describe the underlying concepts for an audience of computational science end-users and developers of domain-specific applications, workflows and platforms services. To exemplify the use of Fenix, we will discuss selected use cases demonstrating how brain researchers can use the offered infrastructure services and describe how access to these resources can be obtained.
JUSUF is a petaflop supercomputer operated by Jülich Supercomputing Centre at Forschungszentrum Jülich as a European supercomputing and cloud resource. JUSUF was funded via the ICEI project and especially serves the Human Brain Project and PRACE via ICEI and the Fenix Research Infrastructure. The system consists of two parts, an HPC cluster partition and an Infrastructure-as-a-Service cloud partition. The system entered production phase in spring 2020. It is based on the Bull X400 product family with AMD Rome processors, partially accelerated by Nvidia V100 GPUs, and Nvidia Mellanox HDR InfiniBand.
Big Data challenges often require application of new data processing paradigms (like MapReduce), and corresponding software solutions (e. g. Hadoop). This trend causes a pressure on both cyber-infrastructure providers (to quickly integrate new services) and infrastructure users (to quickly learn to use new tools). In this paper we present the concept of DARIAH Generic Workspace for Big Data Processing in eHumanities which alleviates the aforementioned problems. It establishes a common integration layer, thus enables a quick integration of new services, and by providing unified interfaces, allows the users to start using new tools without learning their internal details. We describe the overall architecture and implementation details of the working prototype. The presented concept is generic enough to be applied in other emerging cyber-infrastructures for humanities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.