2014
DOI: 10.4018/ijswis.2014040103
|View full text |Cite
|
Sign up to set email alerts
|

Improving Curated Web-Data Quality with Structured Harvesting and Assessment

Abstract: This paper describes a semi-automated process, framework and tools for harvesting, assessing, improving and maintaining high-quality linked-data. The framework, known as DaCura 1 , provides dataset curators, who may not be knowledge engineers, with tools to collect and curate evolving linked data datasets that maintain quality over time. The framework encompasses a novel process, workflow and architecture. A working implementation has been produced and applied firstly to the publication of an existing social-s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
32
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 18 publications
(32 citation statements)
references
References 14 publications
0
32
0
Order By: Relevance
“…We argue that the return in value justified the extra overhead in terms of transcription and platform complexity. Our approach is thus different from, for instance, the Dacura platform [10], which adopts crowdsourcing techniques to elicit facts from datasets such as newspaper articles according to a schema for a particular purpose.…”
Section: Discussionmentioning
confidence: 99%
“…We argue that the return in value justified the extra overhead in terms of transcription and platform complexity. Our approach is thus different from, for instance, the Dacura platform [10], which adopts crowdsourcing techniques to elicit facts from datasets such as newspaper articles according to a schema for a particular purpose.…”
Section: Discussionmentioning
confidence: 99%
“…Our current work builds upon the previous version of our Dacura data curation platform [6] by extending the simple rule-based data validation implemented in Apache Jena/Java described in our Workshop on Linked Data Quality 2014 publication [18] with a custom reasoner and ACID (Atomic, Consistent, Isolated, Durable) triple-store for validation and data integrity enforcement. This new component, the Dacura Quality Service, is built in SWI-Prolog on ClioPatria [19] and is described in the next section.…”
Section: Framework and Approaches For Assessing Linked Data Qualitymentioning
confidence: 99%
“…This reasoner uses a much less permissive interpretation than that of standard OWL to find issues which are likely to stem from specification errors, even in cases where they produce valid OWL models. This tool is integrated into a general purpose ontology analysis framework in the Dacura platform [6] which identifies structural dependencies between ontologies and highlights instances of ontology hijacking.…”
Section: Introductionmentioning
confidence: 99%
“…We invested a significant amount of time in crafting and verifying search queries that would be broad enough to capture all of the reports that we were interested in, without producing an impossibly large number of scans to read. We created a gold-standard dataset for 1831, combining both human-expert and automated methodologies (Feeney 2014). This gave us a benchmark with which to tune search phrases that produced both high coverage and acceptable levels of false positives.…”
Section: Methodology For Constructing Tcd Political Violence Datasetmentioning
confidence: 99%