Douglas Tudhope is a professor at the University of Glamorgan where he leads the Hypermedia Research Unit and is principal investigator on the STAR and STELLAR projects. He can be reached by email at dstudhope
Abstract. This paper discusses the automatic generation of rich metadata for semantic search of grey literature connected with archaeological datasets. The work is part of the STAR project, in collaboration with English Heritage. An extension of the CIDOC CRM for the archaeological domain acts as a core ontology. This enables cross search of various datasets and an extract of the Archaeological Data Service OASIS library of excavation reports. Rich metadata is automatically extracted from grey literature, directed by the CRM, via a three phase process of semantic enrichment employing the GATE toolkit. This is expressed as XML annotations coupled with reports and RDF metadata,. both expressed as CRM entities, qualified by SKOS archaeological concepts. Examples from two applications are discussed. The Andronikos web portal delivers the annotated XML files for visual inspection. The STAR research demonstrator offers unified search of excavation data and grey literature in terms of the core ontology.
The online dissemination of datasets is becoming common practice within the archaeology domain. Since the legacy database schemas involved are often created on a per-site basis, cross searching or reusing this data remains difficult. Employing an integrating ontology, such as the CIDOC CRM, is one step towards resolving these issues. However, this has tended to require computing specialists with detailed knowledge of the ontologies involved. Results are presented from a collaborative project between computer scientists and archaeologists that created lightweight tools to make it easier for non-specialists to publish Linked Data. Archaeologists used the STELLAR project tools to publish major excavation datasets as Linked Data, conforming to the CIDOC CRM ontology. The template-based Extract Transform Load method is described. Reflections on the experience of using the template-based tools are discussed, together with practical issues including the need for terminology alignment and licensing considerations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.