Improving Curated Web-Data Quality with Structured Harvesting and Assessment

Feeney, Kevin; O’Sullivan, Declan; Tai, Wei; Brennan, Rob

doi:10.4018/ijswis.2014040103

Cited by 18 publications

(32 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We argue that the return in value justified the extra overhead in terms of transcription and platform complexity. Our approach is thus different from, for instance, the Dacura platform [10], which adopts crowdsourcing techniques to elicit facts from datasets such as newspaper articles according to a schema for a particular purpose.…”

Section: Discussionmentioning

confidence: 99%

On a Linked Data Platform for Irish Historical Vital Records

Debruyne

Beyan

Grant

et al. 2015

Research and Advanced Technology for Digital Libraries

View full text Add to dashboard Cite

Abstract. The Irish Record Linkage 1864-1913 is a multi-disciplinary project aiming to create a platform for analyzing events captured in historical birth, marriage and death records by applying semantic technologies for annotating, storing and inferring information from the data contained in those records. This enables researchers to, for instance, investigate to what extent maternal and infant mortality rates were underreported. We report on the semantic architecture, provide motivation for the adoption of RDF and Linked Data principles, and elaborate on the ontology construction process that was influenced by both the requirements of the digital archivists and historians. Concerns of digital archivists include the preservation of the archival record and following best practices in preservation, cataloguing and data protection. The historians in this project wish to discover certain patterns in those vital records. An important aspect of the semantic architecture is the clear separation of concerns that reflects those requirements -the transcription and archival authenticity of the register pages and the interpretation of the transcribed data -that led to the creation of two distinct ontologies and knowledge bases.

show abstract

Section: Discussionmentioning

confidence: 99%

On a Linked Data Platform for Irish Historical Vital Records

Debruyne

Beyan

Grant

et al. 2015

Research and Advanced Technology for Digital Libraries

View full text Add to dashboard Cite

show abstract

“…Our current work builds upon the previous version of our Dacura data curation platform [6] by extending the simple rule-based data validation implemented in Apache Jena/Java described in our Workshop on Linked Data Quality 2014 publication [18] with a custom reasoner and ACID (Atomic, Consistent, Isolated, Durable) triple-store for validation and data integrity enforcement. This new component, the Dacura Quality Service, is built in SWI-Prolog on ClioPatria [19] and is described in the next section.…”

Section: Framework and Approaches For Assessing Linked Data Qualitymentioning

confidence: 99%

“…This reasoner uses a much less permissive interpretation than that of standard OWL to find issues which are likely to stem from specification errors, even in cases where they produce valid OWL models. This tool is integrated into a general purpose ontology analysis framework in the Dacura platform [6] which identifies structural dependencies between ontologies and highlights instances of ontology hijacking.…”

Section: Introductionmentioning

confidence: 99%

Linked data schemata: Fixing unsound foundations

Feeney

Gleason

Brennan

2017

Self Cite

View full text Add to dashboard Cite

Abstract. This paper describes our tools and method for an evaluation of the practical and logical implications of combining common linked data vocabularies into a single local logical model for the purpose of reasoning or performing quality evaluations. These vocabularies need to be unified to form a combined model because they reference or reuse terms from other linked data vocabularies and thus the definitions of those terms must be imported. We found that strong interdependencies between vocabularies are common and that a significant number of logical and practical problems make this model unification inconsistent. In addition to identifying problems, this paper suggests a set of recommendations for linked data ontology design best practice. Finally we make some suggestions for improving OWL's support for distributed authoring and ontology reuse.

show abstract

“…We invested a significant amount of time in crafting and verifying search queries that would be broad enough to capture all of the reports that we were interested in, without producing an impossibly large number of scans to read. We created a gold-standard dataset for 1831, combining both human-expert and automated methodologies (Feeney 2014). This gave us a benchmark with which to tune search phrases that produced both high coverage and acceptable levels of false positives.…”

Section: Methodology For Constructing Tcd Political Violence Datasetmentioning

confidence: 99%