Since 2020, the Natural History Museum, London (NHM) has been running the RECODE (Rethinking Collections Data Ecosystems) programme, an initiative that will provision a more open, manageable, configurable and interoperable collections management system (CMS) for the museum. With the overall aim of going live with an initial version of the new CMS by 2025, the first phase of defining a platform-agnostic set of high-level requirements and selecting a new technology partner and platform is nearing completion. The requirements, conceptual data models and other procurement documentation are shared openly through the Open Science Framework (OSF) platform so that any material may benefit and elicit feedback from the wider natural sciences community. RECODE has strived to ensure that our new supplier and technology platform will be well positioned to deliver on the wider vision for community data interoperability, sharing and annotation. Through this presentation, we hope to continue our engagement with the global community by introducing our vision and describing our efforts to ensure that data sharing through technical interoperability and data standards are core features of the new solution. As a digital representation of the collections and related processes, events and transactions, a CMS is an essential tool for many natural science collections, replacing systems that were first analogue and paper-based, and later often distributed across multiple siloed, unstandardised, and unconnected files and databases. Consolidating that data and functionality into coherent, centralised application (as was first achieved at the NHM in 2002) facilitates more effective management of, and access to, both the physical collections and the data describing them. This consolidation also enabled the construction of a core collections data ecosystem within the museum, linking the CMS with frozen collections, providing some basic process for ingestion from digitisation workflows, and setting up a pipeline to offer up data to the NHM Data Portal for publication to the community (Fig. 1). Although an important step on the path, the bespoke nature of these integrations, in part due to technical limitations in the CMS platform for importing and exporting data at scale, have limited further progress in this area. Even just within the museum’s suite of science and collections data platforms there is a range of further potential integrations around the CMS that could add considerable value in streamlining processes and joined-up decision support (Fig. 2). Modern technical capabilities, such as APIs, workflow capabilities and data models, dashboards and analytics, and integrated artificial intelligence (AI) and machine learning (ML) services, provide great potential for better management, sharing and exploitation of the data and the collections themselves. These capabilities, in particular those that support data interoperability, then open up much greater potential for positioning the institutional CMS within the wider external collections, biodiversity and geodiversity data ecosystem (Fig. 3). Not only does this offer much greater potential for using community-curated authorities, tools and services (e.g., Catalogue of Life, GeoNames, Bionomia and Wikidata), but also closer integration with data aggregators and service providers such as the Global Biodiversity Information Facility (GBIF), Distributed System of Scientific Collections (DiSSCo), GeoCASe and Global Genome Biodiversity Network (GGBN), and opens up avenues for joining future initiatives like community data annotation. Over the past decade, the NHM has become increasingly aware that one of the major barriers to moving forward with our ambitions in this regard is outdated infrastructure and technology in the CMS marketplace, which has struggled to keep pace with the wider technology landscape. This realisation has driven the museum to consider more enterprise (and better resourced) technology sectors like Content Services Platforms (CSP). These platforms provide mature products that include these more cutting edge technical capabilities, and tend to be highly configurable in order to be applicable across a wide range of domains. The onus, however, would be on us to design the data models and processes that would need to be configured within these platforms, which forms a major component of the RECODE programme. In this regard, both existing and emerging community standards and models like Spectrum, Darwin Core, Access to Biological Collections Data + Extension for Geosciences (ABCD+EFG), Latimer Core and the International Committee for Documentation Conceptual Reference Model (CIDOC CRM) are vital and will be used heavily to inform this work. Throughout the RECODE process, NHM intends to remain focused on the bigger community vision, and by creating a more open, flexible and community-ready CMS with a stronger focus on interoperability, standards, data quality and data sharing from the outset, pioneer a potential new CMS approach that may benefit others as well as ourselves.
The data modelling of physical natural history objects has never been trivial, and the need for greater interoperability and adherence to multiple standards and internal requirements has made the task more challenging than ever. The Natural History Museum’s internal RECODE (Rethinking Collections Data Ecosystems; see Dupont et al. 2022) programme has taken the approach of creating a data model to fit these internal and external requirements, rather than try and force an existing data model to work with our next generation collections management system (CMS) requirements. In this regard, community standards become vitally important, and existing and emerging standards and models like Spectrum, Darwin Core, Access to Biological Collection Data (ABCD) (Extended for Geosciences (EFG)), Latimer Core and The Conceptual Reference Model from the International Committee for Documentation (CIDOC CRM) have and will be used heavily to inform this work. The poster will provide a starting point for: publicly sharing and discussing the work that the RECODE programme has done; eliciting ideas that members of the community may have regarding its continuing improvement. We have concentrated on creating a backbone for the data model, from collecting, through the object curation to the scientific identification. This has yielded two significant outcomes: The Collection Object: Traditional CMS data models treat each specimen as a single record in the database. The RECODE model recognises that there are a number of different concepts that need their own entities:Collected material: the specimens collected in the field are not always fully identified or separated into discrete items.Stored object: the aim of the RECODE model is to treat all objects as the same type of entity, with relationships between them enhancing the data. For example, a collection object is defined as a discrete object that can be moved and loaned independently. Its specific type (e.g., specimen, preparation, derivation) is given by its relationships to other collection objects.Identifiable item: what can be taxonomically identified does not necessarily have a 1-to-1 relationship with the stored objects. One item may contain multiple species (e.g., a parasite and host; a rock containing many minerals) or one species may be split across many objects (e.g., long branches on two or more herbarium sheets; large skeletons stored in separate locations). The Collection Level Description (CLD): This is a construct to enable the attachment of descriptive and quantitative data to groups of collection objects, rather than individual collection object. There will always be a need for an inventory which represents the basic holdings, organisation and indexing of collections as well as a variety of use cases for grouping collection objects and attaching information at the group level. The Collection Object: Traditional CMS data models treat each specimen as a single record in the database. The RECODE model recognises that there are a number of different concepts that need their own entities:Collected material: the specimens collected in the field are not always fully identified or separated into discrete items.Stored object: the aim of the RECODE model is to treat all objects as the same type of entity, with relationships between them enhancing the data. For example, a collection object is defined as a discrete object that can be moved and loaned independently. Its specific type (e.g., specimen, preparation, derivation) is given by its relationships to other collection objects.Identifiable item: what can be taxonomically identified does not necessarily have a 1-to-1 relationship with the stored objects. One item may contain multiple species (e.g., a parasite and host; a rock containing many minerals) or one species may be split across many objects (e.g., long branches on two or more herbarium sheets; large skeletons stored in separate locations). Collected material: the specimens collected in the field are not always fully identified or separated into discrete items. Stored object: the aim of the RECODE model is to treat all objects as the same type of entity, with relationships between them enhancing the data. For example, a collection object is defined as a discrete object that can be moved and loaned independently. Its specific type (e.g., specimen, preparation, derivation) is given by its relationships to other collection objects. Identifiable item: what can be taxonomically identified does not necessarily have a 1-to-1 relationship with the stored objects. One item may contain multiple species (e.g., a parasite and host; a rock containing many minerals) or one species may be split across many objects (e.g., long branches on two or more herbarium sheets; large skeletons stored in separate locations). The Collection Level Description (CLD): This is a construct to enable the attachment of descriptive and quantitative data to groups of collection objects, rather than individual collection object. There will always be a need for an inventory which represents the basic holdings, organisation and indexing of collections as well as a variety of use cases for grouping collection objects and attaching information at the group level. The next challenge is to integrate the concepts more closely with each other to provide the best possible description of the collection and make it as shareable as possible. Some of the current challenges being addressed are: An object group may represent a heterogenous group of objects. There will be multiple parallel CLD schemes for different purposes. Different attributes and metrics will be relevant to different schemes. For some use cases, we need to be able to quantify relationships between an object group and its attributes as well as attaching metrics to the object group itself. We also need to be able to reflect relationships between object groups. An object group may represent a heterogenous group of objects. There will be multiple parallel CLD schemes for different purposes. Different attributes and metrics will be relevant to different schemes. For some use cases, we need to be able to quantify relationships between an object group and its attributes as well as attaching metrics to the object group itself. We also need to be able to reflect relationships between object groups. These challenges necessitate a data model that has a considerable degree of flexibility but enables rules and constraints to be introduced as appropriate for the different use cases. It is also important that, wherever possible, the model uses the same attributes as individual collection objects, to allow object groups to be implicitly linked to collection object records through common attributes as well as explicitly linked within the model. The aim of the conceptual model is to reflect these requirements.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.