BackgroundThe Natural History Museum, London (NHMUK) has embarked on an ambitious programme to digitise its collections . The first phase of this programme has been to undertake a series of pilot projects that will develop the necessary workflows and infrastructure development needed to support mass digitisation of very large scientific collections. This paper presents the results of one of the pilot projects – iCollections. This project digitised all the lepidopteran specimens usually considered as butterflies, 181,545 specimens representing 89 species from the British Isles and Ireland. The data digitised includes, species name, georeferenced location, collector and collection date - the what, where, who and when of specimen data. In addition, a digital image of each specimen was taken. This paper explains the way the data were obtained and the background to the collections which made up the project.New informationSpecimen-level data associated with British and Irish butterfly specimens have not been available before and the iCollections project has released this valuable resource through the NHM data portal.
The Natural History Museum, London (NHMUK) has embarked on an ambitious programme to digitise its collections. The first phase of this programme was to undertake a series of pilot projects to develop the workflows and infrastructure needed to support mass digitisation of very large scientific collections. This paper presents the results of one of the pilot projects – iCollections. This project digitised all the lepidopteran specimens usually considered as butterflies, 181,545 specimens representing 89 species from the British Isles and Ireland. The data digitised includes, species name, georeferenced location, collector and collection date - the what, where, who and when of specimen data. In addition, a digital image of each specimen was taken. A previous paper explained the way the data were obtained and the background to the collections that made up the project. The present paper describes the technical, logistical, and economic aspects of managing the project.
The Natural History Museum, London (NHMUK) has embarked on an ambitious programme to digitise its collections. The first phase of this programme was to undertake a series of pilot projects to develop the workflows and infrastructure needed to support mass digitisation of very large scientific collections. This paper presents the results of one of the pilot projects – iCollections. This project digitised all the lepidopteran specimens usually considered as butterflies, 181,545 specimens representing 89 species from the British Isles and Ireland. The data digitised includes, species name, georeferenced location, collector and collection date - the what, where, who and when of specimen data. In addition, a digital image of each specimen was taken. A previous paper explained the way the data were obtained and the background to the collections that made up the project. The present paper describes the technical, logistical, and economic aspects of managing the project.
Since 2020, the Natural History Museum, London (NHM) has been running the RECODE (Rethinking Collections Data Ecosystems) programme, an initiative that will provision a more open, manageable, configurable and interoperable collections management system (CMS) for the museum. With the overall aim of going live with an initial version of the new CMS by 2025, the first phase of defining a platform-agnostic set of high-level requirements and selecting a new technology partner and platform is nearing completion. The requirements, conceptual data models and other procurement documentation are shared openly through the Open Science Framework (OSF) platform so that any material may benefit and elicit feedback from the wider natural sciences community. RECODE has strived to ensure that our new supplier and technology platform will be well positioned to deliver on the wider vision for community data interoperability, sharing and annotation. Through this presentation, we hope to continue our engagement with the global community by introducing our vision and describing our efforts to ensure that data sharing through technical interoperability and data standards are core features of the new solution. As a digital representation of the collections and related processes, events and transactions, a CMS is an essential tool for many natural science collections, replacing systems that were first analogue and paper-based, and later often distributed across multiple siloed, unstandardised, and unconnected files and databases. Consolidating that data and functionality into coherent, centralised application (as was first achieved at the NHM in 2002) facilitates more effective management of, and access to, both the physical collections and the data describing them. This consolidation also enabled the construction of a core collections data ecosystem within the museum, linking the CMS with frozen collections, providing some basic process for ingestion from digitisation workflows, and setting up a pipeline to offer up data to the NHM Data Portal for publication to the community (Fig. 1). Although an important step on the path, the bespoke nature of these integrations, in part due to technical limitations in the CMS platform for importing and exporting data at scale, have limited further progress in this area. Even just within the museum’s suite of science and collections data platforms there is a range of further potential integrations around the CMS that could add considerable value in streamlining processes and joined-up decision support (Fig. 2). Modern technical capabilities, such as APIs, workflow capabilities and data models, dashboards and analytics, and integrated artificial intelligence (AI) and machine learning (ML) services, provide great potential for better management, sharing and exploitation of the data and the collections themselves. These capabilities, in particular those that support data interoperability, then open up much greater potential for positioning the institutional CMS within the wider external collections, biodiversity and geodiversity data ecosystem (Fig. 3). Not only does this offer much greater potential for using community-curated authorities, tools and services (e.g., Catalogue of Life, GeoNames, Bionomia and Wikidata), but also closer integration with data aggregators and service providers such as the Global Biodiversity Information Facility (GBIF), Distributed System of Scientific Collections (DiSSCo), GeoCASe and Global Genome Biodiversity Network (GGBN), and opens up avenues for joining future initiatives like community data annotation. Over the past decade, the NHM has become increasingly aware that one of the major barriers to moving forward with our ambitions in this regard is outdated infrastructure and technology in the CMS marketplace, which has struggled to keep pace with the wider technology landscape. This realisation has driven the museum to consider more enterprise (and better resourced) technology sectors like Content Services Platforms (CSP). These platforms provide mature products that include these more cutting edge technical capabilities, and tend to be highly configurable in order to be applicable across a wide range of domains. The onus, however, would be on us to design the data models and processes that would need to be configured within these platforms, which forms a major component of the RECODE programme. In this regard, both existing and emerging community standards and models like Spectrum, Darwin Core, Access to Biological Collections Data + Extension for Geosciences (ABCD+EFG), Latimer Core and the International Committee for Documentation Conceptual Reference Model (CIDOC CRM) are vital and will be used heavily to inform this work. Throughout the RECODE process, NHM intends to remain focused on the bigger community vision, and by creating a more open, flexible and community-ready CMS with a stronger focus on interoperability, standards, data quality and data sharing from the outset, pioneer a potential new CMS approach that may benefit others as well as ourselves.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.