The first special issue of International Journal of Digital Humanities (IJDH) is about born-digital archives, their preservation and research perspectives involving borndigital primary records in the humanities. This is not only a result of the collaboration between the journal's editor-in-chief, Gábor Palkó, Co-Director of the Centre for Digital Humanities at the Eötvös University, who is interested in the practice and theory of digital archives, and the editor of this volume, Thorsten Ries, who conducts research on born-digital dossiers génétiques with digital forensic methods at Ghent University. It is also meant to be a programmatic call to intensify cross-sectoral collaboration between galleries, libraries, archives, and museums (GLAM institutions), digital preservation projects, and humanities research working with digital primary sources. The born-digital historical record of the present age poses great challenges for archival science, librarianship, museology, and information science on the one hand, and to humanities research on the other, next to offering exciting opportunities. Personal digital archives, legal, governmental, institutional, scientific, public, and non-governmental organisations' documentation records or datasets, public repositories of digital publications, web archives, and social media archives are incredibly rich, diverse and multi-faceted treasure troves for historians, political scientists, sociologists, philologists, literary scholars, art historians, digital humanists, and researchers from other humanities disciplines. The effort of long-term preservation, curator-and custodianship for these records and the development of setups, applications and application programming interfaces (API) to make them available for research has been subject of multiple large, successful international projects in archival science, librarianship, and information science. Landmark projects such as the archiving of the digital collections of Salman Rushdie at Emory University Library (Rockmore 2014; Waugh and Russey
Web archives store born-digital documents, which are usually collected from the Internet by crawlers and stored in the Web Archive (WARC) format. The trustworthiness and integrity of web archives is still an open challenge, especially in the news portal domain, which face additional challenges of censorship even in democratic societies. The aim of this paper is to present a light-weight, blockchain-based solution for web archive validation, which would ensure that documents retrieved by crawlers are authentic for many years to come. We developed our archive validation solution as an extension and continuation of our work in web crawler development mainly targeting news portals. The system is designed as an overlay over a blockchain with a proof-of-stake (PoS) distributed consensus algorithm. PoS was chosen due to its lower ecological footprint compared to proof-of-work solutions (e.g. Bitcoin) and lower expected investment in computing infrastructure. We based our prototype on the open-source Nxt blockchain and implemented it in Python. The prototype was tested on web archive content crawled from Hungarian news portals at two different timestamps with more than 1 million articles in total. We concluded that the proposed solution is accessible, usable by different stakeholders to validate crawled content, deployable on cheap commodity hardware, tackles the archive integrity challenge and is capable to efficiently manage duplicate documents.
Integrált könyvtári rendszerek tranzakciós rekordjainak vizsgálata, a könyvtári állomány digitalizálásának tervezésekor .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.