Research data currently face a huge increase of data objects with an increasing variety of types (data types, formats) and variety of workflows by which objects need to be managed across their lifecycle by data infrastructures. Researchers desire to shorten the workflows from data generation to analysis and publication, and the full workflow needs to become transparent to multiple stakeholders, including research administrators and funders. This poses challenges for research infrastructures and user-oriented data services in terms of not only making data and workflows findable, accessible, interoperable and reusable, but also doing so in a way that leverages machine support for better efficiency. One primary need to be addressed is that of findability, and achieving better findability has benefits for other aspects of data and workflow management. In this article, we describe how machine capabilities can be extended to make workflows more findable, in particular by leveraging the Digital Object Architecture, common object operations and machine learning techniques.
Instruments play an essential role in creating research data. Given the importance of instruments and associated metadata to the assessment of data quality and data reuse, globally unique, persistent and resolvable identification of instruments is crucial. The Research Data Alliance Working Group Persistent Identification of Instruments (PIDINST) developed a community-driven solution for persistent identification of instruments which we present and discuss in this paper. Based on an analysis of 10 use cases, PIDINST developed a metadata schema and prototyped schema implementation with DataCite and ePIC as representative persistent identifier infrastructures and with HZB (Helmholtz-Zentrum Berlin für Materialien und Energie) and BODC (British Oceanographic Data Centre) as representative institutional instrument providers. These implementations demonstrate the viability of the proposed solution in practice. Moving forward, PIDINST will further catalyse adoption and consolidate the schema by addressing new stakeholder requirements.
Some of the early Research Data Alliance working groups reused the notion of digital objects as digital entities described by metadata and referenced by a persistent identifier. In recent times the FAIR principles became a prominent role as framework for the sustainability of scientific data. Both approaches had always machine actionability, the capability of computational systems to use services on data without human intervention, in their focus. The more technical approach of digital objects turned out to provide a complementary view on several aspects of the policy framework of FAIR from a technical perspective. After a deeper analysis and integration of these concepts by a group of European data experts the discussion intensified on so called FAIR digital objects. But they need to be accompanied by services as building blocks for automated processes. We will describe the components of this framework and its potentials here, and also which services inside this framework are required. Necessary Abstractions in the Data DomainSeveral studies in relevant data analytic projects, for instance a survey of RDA Europe (RDA Europe 2019) from 2013, say that up to 80% of the time of experts working with data is wasted with data wrangling (i.e. making data ready for analytics). This suggests that only a high degree of automation based on simple structures can provide an alternative to this highly inefficient and error prone way of data handling.The major obstacle for automation is the heterogeneity and complexity of data and abstraction is a generic way to hide this heterogeneity and complexity by encapsulation and virtualization.By encapsulation details are hidden that are not needed at a specific layer. For instance at the data infrastructure layer there is no difference to be made between data, metadata, software, semantic assertions etc. All can be seen as some kind of data, for example as files in a filesystem, that is copied, changed or deleted. At that layer all operations do not distinguish between metadata and data, whereas on a data management and reuse layer a distinction is necessary and metadata must be used to govern the management operations on data.By virtualization one substitutes objects by their logical representation. The most abstract way of such a logical representation is the pointer that leads to the object, a classical and often used approach in Computer Science, hiding all complexity behind a pure reference to the object. With Virtual Machines for instance as another virtualization example one hides only the hardware, but still exposes most of the internal structure in the logical representation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.