In the open world of the (Semantic) web, a world where increasingly diverse materials from disparate sources of different qualities are being made available, an automatic mechanism for the provision of provenance information of these sources is needed. This paper describes voidp, a provenance extension for the void vocabulary, that allows data publishers to specify the provenance relationships of their data. We enumerate voidp's classes and properties, and describe a use case scenario. A wider uptake of voidp by dataset publishers will allow data consuming tools to take advantage of these metadata providing consumers with the origin, i.e., the provenance, of what is being consumed.
A single datum or a set of a categorical data has little value on its own. Combinations of disparate sets of data increase the value of those data sets and helps to discover interesting patterns or relationships, facilitating the construction of new applications and services. In this paper, we describe an implementation of using open geographical data as a core set of "join point"(s) to mesh different public datasets. We describe the challenges faced during the implementation, which include, sourcing the datasets, publishing them as linked data, and normalising these linked data in terms of finding the appropriate "join points" from the individual datasets, as well as developing the client application used for data consumption. We describe the design decisions and our solutions to these challenges. We conclude by drawing some general principles from this work.
Abstract. The ready availability of data is leading to the increased opportunity of their re-use for new applications and for analyses. Most of these data are not necessarily in the format users want, are usually heterogeneous, and highly dynamic, and this necessitates data transformation efforts to re-purpose them. Interactive data transformation (IDT) tools are becoming easily available to lower these barriers to data transformation efforts. This paper describes a principled way to capture data lineage of interactive data transformation processes. We provide a formal model of IDT, its mapping to a provenance representation, and its implementation and validation on Google Refine. Provision of the data transformation process sequences allows assessment of data quality and ensures portability between IDT and other data transformation platforms. The proposed model showed a high level of coverage against a set of requirements used for evaluating systems that provide provenance management solutions.
This paper describes the design and implementation of backward chained clustered RDFS reasoning in 4store. The system presented, called "4s-reasoner", adds no overhead to the import phase and yet performs reasonably well at the query phase. We also demonstrate that our solution scales over clusters of commodity servers providing an optimal solution that balances infrastructure cost and performance over tested data sets with up to 500M triples. In addition we have shared our implementation under GNU license and a first release is available to be used by the community.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.