Abstract:Research data in the humanities needs to be sustainable, and access to digital resources must be possible over a long period. Only if these prerequisites are fulfilled can research data be used as a source for other projects. In addition, reliability is a fundamental requirement so that digital sources can be cited, reused, and quoted. To address this problem, we present our solution: the Data and Service Center for the Humanities located in Switzerland. The centralized infrastructure is based on flexible and … Show more
“…Two papers took a broad look at both migration and maintenance (Stigler & Steiner, 2018; Thomer & Twidale, 2014). And finally, 11 papers discuss migration as a background or tangential concern (Breeding, 2002; Brush & Jiras, 2019; Byrne, 2014; Gentry et al, 2021; Kansa, 2005; Kipnis MSI et al, 2019; Knight‐Davis et al, 2015; Liu & Zhou, 2011; Rieh et al, 2008; Rosenthaler et al, 2015; Yin et al, 2020). …”
Database maintenance and migration are critical but under‐supported activities in libraries, archives, museums (LAMs), and other scholarly spaces. Existing guidelines for digital curation rarely account for the maintenance needed to keep digital curation infrastructures functioning over time. Though many case studies have been published describing individual instances of migration, there has been little generalizable research done in this area. Thus, it is challenging to understand overall trends or best practices in this space. We bridge this gap by conducting an integrative literature review of papers describing database migrations and maintenance in LAMs and other scholarly contexts. By qualitatively coding 75 articles from 58 publication venues, we identify common motivations for database migrations and maintenance actions. We find that databases are migrated to support changing user needs as well as to ward off technological obsolescence; we also find that common challenges include schema crosswalking and a need for data cleaning. Practitioners describe community collaboration as key in surmounting these challenges. Through this integrative review, we build a base for further best practices development and identify a need to better model database curation as part of the digital curation lifecycle.
“…Two papers took a broad look at both migration and maintenance (Stigler & Steiner, 2018; Thomer & Twidale, 2014). And finally, 11 papers discuss migration as a background or tangential concern (Breeding, 2002; Brush & Jiras, 2019; Byrne, 2014; Gentry et al, 2021; Kansa, 2005; Kipnis MSI et al, 2019; Knight‐Davis et al, 2015; Liu & Zhou, 2011; Rieh et al, 2008; Rosenthaler et al, 2015; Yin et al, 2020). …”
Database maintenance and migration are critical but under‐supported activities in libraries, archives, museums (LAMs), and other scholarly spaces. Existing guidelines for digital curation rarely account for the maintenance needed to keep digital curation infrastructures functioning over time. Though many case studies have been published describing individual instances of migration, there has been little generalizable research done in this area. Thus, it is challenging to understand overall trends or best practices in this space. We bridge this gap by conducting an integrative literature review of papers describing database migrations and maintenance in LAMs and other scholarly contexts. By qualitatively coding 75 articles from 58 publication venues, we identify common motivations for database migrations and maintenance actions. We find that databases are migrated to support changing user needs as well as to ward off technological obsolescence; we also find that common challenges include schema crosswalking and a need for data cleaning. Practitioners describe community collaboration as key in surmounting these challenges. Through this integrative review, we build a base for further best practices development and identify a need to better model database curation as part of the digital curation lifecycle.
“…Gravsearch has been developed as part of Knora (Knowledge Organization, Representation, and Annotation), an application developed by the Data and Service Center for the Humanities (DaSCH) [22] to ensure the long-term availability and reusability of research data in the humanities. 3 The Swiss National Science Foundation (SNSF) requires researchers to have a plan for making their research data publicly accessible, 4 and the DaSCH was created to provide the necessary infrastructure to meet this requirement.…”
Section: Institutional and Technical Contextmentioning
confidence: 99%
“…Here were are looking for a text containing the word Acta that is marked up as a bibliographical reference. 22 Knora can store text markup as 'standoff markup': each markup tag is represented as an entity in the triplestore, with start and end positions referring to a substring in the text. This makes it straightforward to represent non-hierarchical structures in markup, 23 and makes it possible for queries to combine criteria referring to text markup with criteria referring to other entities in the triplestore, including links within text markup that point to RDF resources outside the text.…”
Section: Example 3: Searching For Text Markupmentioning
confidence: 99%
“…This makes it straightforward to represent non-hierarchical structures in markup, 23 and makes it possible for queries to combine criteria referring to text markup with criteria referring to other entities in the triplestore, including links within text markup that point to RDF resources outside the text. Projects may define own standoff entities in their 22 Gravsearch translates this FILTER into two operations:…”
Section: Example 3: Searching For Text Markupmentioning
RDF triplestores have become an appealing option for storing and publishing humanities data, but available technologies for querying this data have drawbacks that make them unsuitable for many applications. Gravsearch (Virtual Graph Search), a SPARQL transformer developed as part of a web-based API, is designed to support complex searches that are desirable in humanities research, while avoiding these disadvantages. It does this by introducing server software that mediates between the client and the triplestore, transforming an input SPARQL query into one or more queries executed by the triplestore. This design suggests a practical way to go beyond some limitations of the ways that RDF data has generally been made available.
“…The establishment of humanities data centres as a collaboration between well established infrastructure providers and digital humanists is one possible approach to ensure a trustworthy long term preservation. Examples are the HDC (Buddenbohm, Engelhardt & Wuttke 2016), the Data Center for the Humanities (DCH) Cologne (Sahle & Kronenwett 2016), the Data and Service Center for the Humanities (DaSCH) (Rosenthaler, Fornaro & Clivaz 2015) and the Kompetenznetzwerk Digitale Edition (KONDE, 2018). Key to the success of such infrastructures is the relevance to the research community of the services provided.…”
Section: Service Levels For Long Term Availability Of Sdesmentioning
Ensuring the long-term availability of research data forms an integral part of data management services. Where OAIS compliant digital preservation has been established in recent years, in almost all cases the services aim at the preservation of file-based objects. In the Digital Humanities, research data is often represented in highly structured aggregations, such as Scholarly Digital Editions. Naturally, scholars would like their editions to remain functionally complete as long as possible. Besides standard components like webservers, the presentation typically relies on project specific code interacting with client software like webbrowsers. Especially the latter being subject to rapid change over time invariably makes such environments awkward to maintain once funding has ended. Pragmatic approaches have to be found in order to balance the curation effort and the maintainability of access to research data over time. A sketch of four potential service levels aiming at the long-term availability of research data in the humanities is outlined: (1) Continuous Maintenance, (2) Application Conservation, (3) Application Data Preservation, and (4) Bitstream Preservation. The first being too costly and the last hardly satisfactory in general, we suggest that the implementation of services by an infrastructure provider should concentrate on service levels 2 and 3. We explain their strengths and limitations considering the example of two Scholarly Digital Editions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.