A key challenge for the Semantic Web is to acquire the capability to effectively query large knowledge bases. As there will be several competing systems, we need benchmarks that will objectively evaluate these systems. Development of effective benchmarks in an emerging domain is a challenging endeavor. In this paper, we propose a requirements driven framework for developing benchmarks for Semantic Web Knowledge Base Systems (SW KBSs). In this paper, we make two major contributions. First, we provide a list of requirements for SW KBS benchmarks. This can serve as an unbiased guide to both the benchmark developers and personnel responsible for systems acquisition and benchmarking. Second, we provide an organized collection of techniques and tools needed to develop such benchmarks. In particular, the collection contains a detailed guide for generating benchmark workload, defining performance metrics, and interpreting experimental results.
In this work we adapt an efficient information integration algorithm to identify the minimal set of potentially relevant Semantic Web data sources for a given query. The vast majority of these sources are files written in RDF or OWL format, and must be processed in their entirety. Our adaptation includes enhancing the algorithm with taxonomic reasoning, defining and using a mapping language for the purpose of aligning heterogeneous Semantic Web ontologies, and introducing a concept of source relevance to reduce the number of sources that we need to consider for a given query. After the source selection process, we load the selected sources into a Semantic Web reasoner to get a sound and complete answer to the query. We have conducted an experiment using synthetic ontologies and data sources which demonstrates that our system performs well over a wide range of queries. A typical response time for a substantial work load of 50 domain ontologies, 80 map ontologies and 500 data sources is less than 2 seconds. Furthermore, our system returned correct answers to 200 randomly generated queries in several workload configurations. We have also compared our adaptation with a basic implementation of the original information integration algorithm that does not do any taxonomic reasoning. In the most complex configuration with 50 domain ontologies, 100 map ontologies and 1000 data sources our system returns complete answers to all the queries whereas the basic implementation returns complete answers to only 28% of the queries.
We present a method for rapid development of benchmarks for Semantic Web knowledge base systems. At the core, we have a synthetic data generation approach for OWL that is scalable and models the real world data. The data-generation algorithm learns from real domain documents and generates benchmark data based on the extracted properties relevant for benchmarking. We believe that this is important because relative performance of systems will vary depending on the structure of the ontology and data used. However, due to the novelty of the Semantic Web, we rarely have sufficient data for benchmarking. Our approach helps overcome the problem of having insufficient real world data for benchmarking and allows us to develop benchmarks for a variety of domains and applications in a very time efficient manner. Based on our method, we have created a new Lehigh BibTeX Benchmark and conducted an experiment on four Semantic Web knowledge base systems. We have verified our hypothesis about the need for representative data by comparing the experimental result to that of our previous Lehigh University Benchmark. The difference in both experiments has demonstrated the influence of ontology and data on the capability and performance of the systems and thus the need of using a representative benchmark for the intended application of the systems.
Abstract. A distributed, end-to-end information integration system that is based on the Semantic Web architecture is of considerable interest to both commercial and government organizations. However, there are a number of challenges that have to be resolved to build such a system given the currently available Semantic Web technologies. We describe here the ISENS prototype system we designed, implemented, and tested (on a small scale) to address this problem. We discuss certain system limitations (some coming from underlying technologies used) and future ISENS development to resolve them and to enable an extended set of capabilities.
In recent years, there has been an explosion of publicly available RDF and OWL web pages. Typically, these pages are small, heterogeneous and prone to change frequently. In order to effectively integrate them, we propose to adapt a query reformulation algorithm and combine it with an information retrieval inspired index in order to select all sources relevant to a query. We treat each RDF document as a bag of URIs and literals and build an inverted index. Our system first reformulates the user's query into a set of subgoals and then translates these into Boolean queries against the index in order to determine which sources are relevant. Finally, the selected data sources and the relevant ontology mappings are used in conjunction with a description logic reasoner to provide an efficient query answering solution for the Semantic Web. We have evaluated our system using ontology mappings and ten million real world data sources.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.