Abstract:The DARE platform enables researchers and their developers to exploit more capabilities to handle complexity and scale in data, computation and collaboration. Today's challenges pose increasing and urgent demands for this combination of capabilities. To meet technical, economic and governance constraints, application communities must use use shared digital infrastructure principally via virtualisation and mapping. This requires precise abstractions that retain their meaning while their implementations and infr… Show more
“…We start from the Waveform pre-processing workflow in Figure 4. It is composed of simple stateless PEs receiving and producing waveform time-series according to a data format defined in the Obspy software package 8 [28]. The provenance type SeismoType, listed in Table II, can handle such a payload thanks to the implementation of its extractItemMetadata method, which uses the Obspy toolkit to access seismic data and metadata.…”
Section: A Seismic Analysis Workflows: Lineage Precision and Metadatmentioning
confidence: 99%
“…It is in use for demanding applications in seismology and climate-impact research, for the VERCE 1 [7] and CLIPC 2 projects, and now the DARE project 3 . The DARE architecture and vision, described in a contemporary paper [8], is the context for our recent achievements, illustrated with the seismic rapid assessment use case. (Section VI).…”
We present a practical approach for provenance capturing in Data-Intensive workflow systems. It provides contextualisation by recording injected domain metadata with the provenance stream. It offers control over lineage precision, combining automation with specified adaptations. We address provenance tasks such as extraction of domain metadata, injection of custom annotations, accuracy and integration of records from multiple independent workflows running in distributed contexts. To allow such flexibility, we introduce the concepts of programmable Provenance Types and Provenance Configuration. Provenance Types handle domain contextualisation and allow developers to model lineage patterns by redefining API methods, composing easy-to-use extensions. Provenance Configuration, instead, enables users of a Data-Intensive workflow execution o prepare it for provenance capture, by configuring the attribution of Provenance Types to components and by specifying grouping into semantic clusters. This enables better searches over the lineage records. Provenance Types and Provenance Configuration are demonstrated in a system being used by computational seismologists. It is based on an extended provenance model, S-PROV.
“…We start from the Waveform pre-processing workflow in Figure 4. It is composed of simple stateless PEs receiving and producing waveform time-series according to a data format defined in the Obspy software package 8 [28]. The provenance type SeismoType, listed in Table II, can handle such a payload thanks to the implementation of its extractItemMetadata method, which uses the Obspy toolkit to access seismic data and metadata.…”
Section: A Seismic Analysis Workflows: Lineage Precision and Metadatmentioning
confidence: 99%
“…It is in use for demanding applications in seismology and climate-impact research, for the VERCE 1 [7] and CLIPC 2 projects, and now the DARE project 3 . The DARE architecture and vision, described in a contemporary paper [8], is the context for our recent achievements, illustrated with the seismic rapid assessment use case. (Section VI).…”
We present a practical approach for provenance capturing in Data-Intensive workflow systems. It provides contextualisation by recording injected domain metadata with the provenance stream. It offers control over lineage precision, combining automation with specified adaptations. We address provenance tasks such as extraction of domain metadata, injection of custom annotations, accuracy and integration of records from multiple independent workflows running in distributed contexts. To allow such flexibility, we introduce the concepts of programmable Provenance Types and Provenance Configuration. Provenance Types handle domain contextualisation and allow developers to model lineage patterns by redefining API methods, composing easy-to-use extensions. Provenance Configuration, instead, enables users of a Data-Intensive workflow execution o prepare it for provenance capture, by configuring the attribution of Provenance Types to components and by specifying grouping into semantic clusters. This enables better searches over the lineage records. Provenance Types and Provenance Configuration are demonstrated in a system being used by computational seismologists. It is based on an extended provenance model, S-PROV.
“…Examples of commercial clouds being used in largescale scientific contexts are found on both sides of the Atlantic: in the European Open Science Cloud 1 (EOSC) case as well as in the massive ongoing migration of data and other resources onto Amazon's AWS by NASA 2 . This work has been supported by the EU H2020 research and innovation programme under grant agreement No 777413. 1 https://ec.europa.eu/research/openscience/index.cfm?pg= open-science-cloud 2 https://aws.amazon.com/partners/success/nasa-image-library/ It follows that while potential for large scale data-driven experimentation increases, so does complexity. At the same time, making use of vendor-specific features may lead to lock-in.…”
Section: Dare: a Reflective Platform Designed To Enablementioning
confidence: 99%
“…6) Exposing all relevant functionality via a set of RESTful APIs that (1) effectively hide technical detail and (2) enable research developers to build solutions that exploit multiple underlying e-infrastructures with minimal effort. The overarching vision behind DARE as well as its main architectural considerations and components can be found in [2]. This paper describes the current technical instantiation in response to this vision, its main software components and their interactions.…”
Section: Dare: a Reflective Platform Designed To Enablementioning
confidence: 99%
“…As per its reference architecture [2], the DARE platform implements a logical knowledge-base as a series of stores and 18 https://rook.io/ 19 https://eudat.eu/services/b2drop registries. The semantification and tighter integration of these constituent stores and registries is ongoing work.…”
Section: Knowledge Base Provenance Tracking and Metadatamentioning
The DARE platform has been designed to help research developers deliver user-facing applications and solutions over diverse underlying e-infrastructures, data and computational contexts. The platform is Cloud-ready, and relies on the exposure of API, which are suitable for raising the abstraction level and hiding complexity. It implements the cataloguing and execution of fine-grained and Python-based dispel4py workflows as services. Reflection is achieved via a logical knowledge base, comprising multiple internal catalogues, registries and semantics, while it supports persistent and pervasive data provenance. This paper presents design and implementation aspects of the DARE platform, as well as it provides directions for future development.
Collaboration across institutional boundaries is widespread and increasing today. It depends on federations sharing data that often have governance rules or external regulations restricting their use. However, the handling of data governance rules (aka. data-use policies) remains manual, time-consuming and error-prone, limiting the rate at which collaborations can form and respond to challenges and opportunities, inhibiting citizen science and reducing data providers' trust in compliance. Using an automated system to facilitate compliance handling reduces substantially the time needed for such non-mission work, thereby accelerating collaboration and improving productivity. We present a framework, Dr.Aid, that helps individuals, organisations and federations comply with data rules, using automation to track which rules are applicable as data is passed between processes and as derived data is generated. It encodes data-governance rules using a formal language and performs reasoning on multi-input-multi-output data-flow graphs in decentralised contexts. We test its power and utility by working with users performing cyclone tracking and earthquake modelling to support mitigation and emergency response. We query standard provenance traces to detach Dr.Aid from details of the tools and systems they are using, as these inevitably vary across members of a federation and through time. We evaluate the model in three aspects by encoding real-life data-use policies from diverse fields, showing its capability for real-world usage and its advantages compared with traditional frameworks. We argue that this approach will lead to more agile, more productive and more trustworthy collaborations and show that the approach can be adopted incrementally. This, in-turn, will allow more appropriate data policies to emerge opening up new forms of collaboration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.