Comprehensible Control for Researchers and Developers Facing Data Challenges

Atkinson, Malcolm; Filgueira, Rosa; Klampanos, Iraklis A.; Koukourikos, Antonis; Krause, Amrey; Magnoni, Federica; Pagé, Christian; Rietbrock, Andreas; Spinuso, Alessandro

doi:10.1109/escience.2019.00042

Cited by 5 publications

(9 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We start from the Waveform pre-processing workflow in Figure 4. It is composed of simple stateless PEs receiving and producing waveform time-series according to a data format defined in the Obspy software package 8 [28]. The provenance type SeismoType, listed in Table II, can handle such a payload thanks to the implementation of its extractItemMetadata method, which uses the Obspy toolkit to access seismic data and metadata.…”

Section: A Seismic Analysis Workflows: Lineage Precision and Metadatmentioning

confidence: 99%

See 1 more Smart Citation

Active Provenance for Data-Intensive Workflows: Engaging Users and Developers

Spinuso

Atkinson

Magnoni

2019

2019 15th International Conference on eScience (eScience)

Self Cite

View full text Add to dashboard Cite

We present a practical approach for provenance capturing in Data-Intensive workflow systems. It provides contextualisation by recording injected domain metadata with the provenance stream. It offers control over lineage precision, combining automation with specified adaptations. We address provenance tasks such as extraction of domain metadata, injection of custom annotations, accuracy and integration of records from multiple independent workflows running in distributed contexts. To allow such flexibility, we introduce the concepts of programmable Provenance Types and Provenance Configuration. Provenance Types handle domain contextualisation and allow developers to model lineage patterns by redefining API methods, composing easy-to-use extensions. Provenance Configuration, instead, enables users of a Data-Intensive workflow execution o prepare it for provenance capture, by configuring the attribution of Provenance Types to components and by specifying grouping into semantic clusters. This enables better searches over the lineage records. Provenance Types and Provenance Configuration are demonstrated in a system being used by computational seismologists. It is based on an extended provenance model, S-PROV.

show abstract

Section: A Seismic Analysis Workflows: Lineage Precision and Metadatmentioning

confidence: 99%

“…It is in use for demanding applications in seismology and climate-impact research, for the VERCE 1 [7] and CLIPC 2 projects, and now the DARE project 3 . The DARE architecture and vision, described in a contemporary paper [8], is the context for our recent achievements, illustrated with the seismic rapid assessment use case. (Section VI).…”

Section: Introductionmentioning

confidence: 99%

Active Provenance for Data-Intensive Workflows: Engaging Users and Developers

Spinuso

Atkinson

Magnoni

2019

2019 15th International Conference on eScience (eScience)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Examples of commercial clouds being used in largescale scientific contexts are found on both sides of the Atlantic: in the European Open Science Cloud 1 (EOSC) case as well as in the massive ongoing migration of data and other resources onto Amazon's AWS by NASA 2 . This work has been supported by the EU H2020 research and innovation programme under grant agreement No 777413. 1 https://ec.europa.eu/research/openscience/index.cfm?pg= open-science-cloud 2 https://aws.amazon.com/partners/success/nasa-image-library/ It follows that while potential for large scale data-driven experimentation increases, so does complexity. At the same time, making use of vendor-specific features may lead to lock-in.…”

Section: Dare: a Reflective Platform Designed To Enablementioning

confidence: 99%

“…6) Exposing all relevant functionality via a set of RESTful APIs that (1) effectively hide technical detail and (2) enable research developers to build solutions that exploit multiple underlying e-infrastructures with minimal effort. The overarching vision behind DARE as well as its main architectural considerations and components can be found in [2]. This paper describes the current technical instantiation in response to this vision, its main software components and their interactions.…”

Section: Dare: a Reflective Platform Designed To Enablementioning

confidence: 99%

See 1 more Smart Citation

DARE: A Reflective Platform Designed to Enable Agile Data-Driven Research on the Cloud

Klampanos

Magnoni

Casarotti

et al. 2019

2019 15th International Conference on eScience (eScience)

Self Cite

View full text Add to dashboard Cite

The DARE platform has been designed to help research developers deliver user-facing applications and solutions over diverse underlying e-infrastructures, data and computational contexts. The platform is Cloud-ready, and relies on the exposure of API, which are suitable for raising the abstraction level and hiding complexity. It implements the cataloguing and execution of fine-grained and Python-based dispel4py workflows as services. Reflection is achieved via a logical knowledge base, comprising multiple internal catalogues, registries and semantics, while it supports persistent and pervasive data provenance. This paper presents design and implementation aspects of the DARE platform, as well as it provides directions for future development.

show abstract

Dr.Aid: Supporting Data-governance Rule Compliance for Decentralized Collaboration in an Automated Way

Rui

Atkinson

Papapanagiotou

et al. 2021

Proc. ACM Hum.-Comput. Interact.

Self Cite

View full text Add to dashboard Cite

Collaboration across institutional boundaries is widespread and increasing today. It depends on federations sharing data that often have governance rules or external regulations restricting their use. However, the handling of data governance rules (aka. data-use policies) remains manual, time-consuming and error-prone, limiting the rate at which collaborations can form and respond to challenges and opportunities, inhibiting citizen science and reducing data providers' trust in compliance. Using an automated system to facilitate compliance handling reduces substantially the time needed for such non-mission work, thereby accelerating collaboration and improving productivity. We present a framework, Dr.Aid, that helps individuals, organisations and federations comply with data rules, using automation to track which rules are applicable as data is passed between processes and as derived data is generated. It encodes data-governance rules using a formal language and performs reasoning on multi-input-multi-output data-flow graphs in decentralised contexts. We test its power and utility by working with users performing cyclone tracking and earthquake modelling to support mitigation and emergency response. We query standard provenance traces to detach Dr.Aid from details of the tools and systems they are using, as these inevitably vary across members of a federation and through time. We evaluate the model in three aspects by encoding real-life data-use policies from diverse fields, showing its capability for real-world usage and its advantages compared with traditional frameworks. We argue that this approach will lead to more agile, more productive and more trustworthy collaborations and show that the approach can be adopted incrementally. This, in-turn, will allow more appropriate data policies to emerge opening up new forms of collaboration.

show abstract

Comprehensible Control for Researchers and Developers Facing Data Challenges

Cited by 5 publications

References 18 publications

Active Provenance for Data-Intensive Workflows: Engaging Users and Developers

Active Provenance for Data-Intensive Workflows: Engaging Users and Developers

DARE: A Reflective Platform Designed to Enable Agile Data-Driven Research on the Cloud

Dr.Aid: Supporting Data-governance Rule Compliance for Decentralized Collaboration in an Automated Way

Contact Info

Product

Resources

About