In this paper we study the diagnosis and repair of incoherent terminologies. We define a number of new nonstandard reasoning services to explain incoherence through pinpointing, and we present algorithms for all of these services. For one of the core tasks of debugging, the calculation of minimal unsatisfiability preserving subterminologies, we developed two different algorithms, one implementing a bottom-up approach using support of an external description logic reasoner, the other implementing a specialized tableau-based calculus. Both algorithms have been prototypically implemented. We study the effectiveness of our algorithms in two ways: we present a realistic case study where we diagnose a terminology used in a practical application, and we perform controlled benchmark experiments to get a better understanding of the computational properties of our algorithms in particular and the debugging problem in general.
Abstract. Instance-based ontology mapping is a promising family of solutions to a class of ontology alignment problems. It crucially depends on measuring the similarity between sets of annotated instances. In this paper we study how the choice of co-occurrence measures affects the performance of instance-based mapping.To this end, we have implemented a number of different statistical co-occurrence measures. We have prepared an extensive test case using vocabularies of thousands of terms, millions of instances, and hundreds of thousands of co-annotated items. We have obtained a human Gold Standard judgement for part of the mapping-space. We then study how the different co-occurrence measures and a number of algorithmic variations perform on our benchmark dataset as compared against the Gold Standard.Our systematic study shows excellent results of instance-based matching in general, where the more simple measures often outperform more sophisticated statistical co-occurrence measures.
It is widely accepted that proper data publishing is difficult. The majority of Linked Open Data (LOD) does not meet even a core set of data publishing guidelines. Moreover, datasets that are clean at creation, can get stains over time. As a result, the LOD cloud now contains a high level of dirty data that is difficult for humans to clean and for machines to process.Existing solutions for cleaning data (standards, guidelines, tools) are targeted towards human data creators, who can (and do) choose not to use them. This paper presents the LOD Laundromat which removes stains from data without any human intervention. This fully automated approach is able to make very large amounts of LOD more easily available for further processing right now.LOD Laundromat is not a new dataset, but rather a uniform point of entry to a collection of cleaned siblings of existing datasets. It provides researchers and application developers a wealth of data that is guaranteed to conform to a specified set of best practices, thereby greatly improving the chance of data actually being (re)used.
Abstract. Ontologies are the backbone of the Semantic Web as they allow one to share vocabulary in a semantically sound way. For ontologies, specified in OWL or a related web ontology language, Description Logic reasoner can often detect logical contradictions. Unfortunately, there are two drawbacks: they lack in support for debugging incoherence in ontologies, and they can only be applied to reasonably expressive ontologies (containing at least some sort of negation).In this paper, we attempt to close these gaps using a technique called pinpointing. In pinpointing we identify minimal sets of axioms which need to be removed or ignored to turn an ontology coherent. We then show how pinpointing can be used for debugging of web ontologies in two typical cases. More unusual is the application of pinpointing in the semantic clarification of underspecified web ontologies which we experimentally evaluate on a number of well-known web-ontologies. Our findings are encouraging: even though semantic ambiguity remains an issue, we show that pinpointing can be useful for debugging, and that it can significantly improve the quality of our semantic enrichment in a fully automatic way.
Due to the growing popularity of Description Logics-based knowledge representation systems, predominantly in the context of Semantic Web applications, there is a rising demand for tools offering non-standard reasoning services. One particularly interesting form of reasoning, both from the user as well as the ontology engineering perspective, is abduction. In this paper we introduce two novel reasoning calculi for solving ABox abduction problems in the Description Logic ALC, i.e. problems of finding minimal sets of ABox axioms, which when added to the knowledge base enforce entailment of a requested set of assertions. The algorithms are based on regular connection tableaux and resolution with set-of-support and are proven to be sound and complete. We elaborate on a number of technical issues involved and discuss some practical aspects of reasoning with the methods.
Abstract. Both materialization and backward-chaining as different modes of performing inference have complementary advantages and disadvantages.Materialization enables very efficient responses at query time, but at the cost of an expensive up front closure computation, which needs to be redone every time the knowledge base changes. Backward-chaining does not need such an expensive and change-sensitive precomputation, and is therefore suitable for more frequently changing knowledge bases, but has to perform more computation at query time.Materialization has been studied extensively in the recent semantic web literature, and is now available in industrial-strength systems. In this work, we focus instead on backward-chaining, and we present an hybrid algorithm to perform efficient backward-chaining reasoning on very large datasets expressed in the OWL Horst (pD * ) fragment.As a proof of concept, we have implemented a prototype called QueryPIE (Query Parallel Inference Engine), and we have tested its performance on different datasets of up to 1 billion triples. Our parallel implementation greatly reduces the reasoning complexity of a naive backward-chaining approach and returns results for single query-patterns in the order of milliseconds when running on a modest 8 machine cluster.To the best of our knowledge, QueryPIE is the first reported backward-chaining reasoner for OWL Horst that efficiently scales to a billion triples.
Abstract. The diversity of sources of information for historical research fill a continuum between individual accounts transmitted for instance in letters but also in poems and songs, and aggregated statistical information as in the case of historical census. Historiography shares this heterogeneity and complexity of source material with other humanities fields. Methods to order this rich material, and by this ordering also to determine the way history is told are as old as history writing and vary among the different branches (or subdisciplines) of historical research.In this paper we focus on the work of historians, and even more specifically economic and social history.At the crossroad of information and historical sciences, so-called Historical Informatics or History and Computing emerged as a specific profession during the nineties of the last century. Together with computer scientists historians created a research agenda concentrating around questions how to create, design, enrich, edit, retrieve, analyze and present historical information with help of information technology. There exist a number problems and challenges in this field; some of them are closely related to semantics and meaning of knowledge in general. In this context, Semantic Web technologies can be applied in a number of situations, environments, applications of historical computing and historical information science. However, only a few number of contributions have yet considered these technologies. In this survey we present an overview of the past and present problems, challenges and advances of historical science computing, from out the perspective of Semantic technology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.