Many LOD datasets, such as DBpedia and LinkedGeoData, are voluminous and process large amounts of requests from diverse applications. Many data products and services rely on full or partial local LOD replications to ensure faster querying and processing. While such replicas enhance the flexibility of information sharing and integration infrastructures, they also introduce data duplication with all the associated undesirable consequences. Given the evolving nature of the original and authoritative datasets, to ensure consistent and up-to-date replicas frequent replacements are required at a great cost. In this paper, we introduce an approach for interest-based RDF update propagation, which propagates only interesting parts of updates from the source to the target dataset. Effectively, this enables remote applications to 'subscribe' to relevant datasets and consistently reflect the necessary changes locally without the need to frequently replace the entire dataset (or a relevant subset). Our approach is based on a formal definition for graphpattern-based interest expressions that is used to filter interesting parts of updates from the source. We implement the approach in the iRap framework and perform a comprehensive evaluation based on DBpedia Live updates, to confirm the validity and value of our approach.
Abstract. Linking Data initiatives have fostered the publication of large number of RDF datasets in the Linked Open Data (LOD) cloud, as well as the development of query processing infrastructures to access these data in a federated fashion. However, different experimental studies have shown that availability of LOD datasets cannot be always ensured, being RDF data replication required for envisioning reliable federated query frameworks. Albeit enhancing data availability, RDF data replication requires synchronization and conflict resolution when replicas and source datasets are allowed to change data over time, i.e., co-evolution management needs to be provided to ensure consistency. In this paper, we tackle the problem of RDF data co-evolution and devise an approach for conflict resolution during co-evolution of RDF datasets. Our proposed approach is property-oriented and allows for exploiting semantics about RDF properties during co-evolution management. The quality of our approach is empirically evaluated in different scenarios on the DBpedia-live dataset. Experimental results suggest that proposed proposed techniques have a positive impact on the quality of data in source datasets and replicas.
The current decade is a witness to an enormous explosion of data being published on the Web as Linked Data to maximise its reusability. Answering questions that users speak or write in natural language is an increasingly popular application scenario for Web Data, especially when the domain of the questions is not limited to a domain where dedicated curated datasets exist, like in medicine. The increasing use of Web Data in this and other settings has highlighted the importance of assessing its quality. While quite some work has been done with regard to assessing the quality of Linked Data, only few efforts have been dedicated to quality assessment of linked data from the question answering (QA) perspective. From the linked data quality metrics that have so far been well documented in the literature, we have identified those that are most relevant for QA. We apply these quality metrics, implemented in the Luzzu framework, to subsets of two datasets of crucial importance to open domain QA-DBpedia and Wikidata-and thus present the first assessment of the quality of these datasets for QA. From these datasets, we assess slices covering the specific domains of restaurants, politicians, films and soccer players. The results of our experiments suggest that for most of these domains, the quality of Wikidata with regard to the majority of relevant metrics is higher than that of DBpedia.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.