Information and communication infrastructures underwent a rapid and extreme decentralization process over the past decade: From a world of statically and partially connected central servers rose an intricate web of millions of information sources loosely connecting one to another. Today, we expect to witness the extension of this revolution with the wide adoption of meta-data standards like RDF or OWL underpinning the creation of a semantic web. Again, we hope for global properties to emerge from a multiplicity of pair-wise, local interactions, resulting eventually in a self-stabilizing semantic infrastructure. This paper represents an effort to summarize the conditions under which this revolution would take place as well as an attempt to underline its main properties, limitations and possible applications. The work presented in this paper reflects the current status of a collaborative effort initiated by the IFIP 2.6 Working Group on Data Semantics.
Abstract. Efficient subsumption checking, deciding whether a subscription or publication is covered by a set of previously defined subscriptions, is of paramount importance for publish/subscribe systems. It provides the core system functionality-matching of publications to subscriber needs expressed as subscriptions-and additionally, reduces the overall system load and generated traffic since the covered subscriptions are not propagated in distributed environments. As the subsumption problem was shown previously to be co-NP complete and existing solutions typically apply pairwise comparisons to detect the subsumption relationship, we propose a 'Monte Carlo type' probabilistic algorithm for the general subsumption problem. It determines whether a publication/subscription is covered by a disjunction of subscriptions in O(k m d), where k is the number of subscriptions, m is the number of distinct attributes in subscriptions, and d is the number of tests performed to answer a subsumption question. The probability of error is problem-specific and typically very small, and sets an upper bound on d. Our experimental results show significant gains in term of subscription set reduction which has favorable impact on the overall system performance as it reduces the total computational costs and networking traffic. Furthermore, the expected theoretical bounds underestimate algorithm performance because it performs much better in practice due to introduced optimizations, and is adequate for fast forwarding of subscriptions in case of high subscription rate.
The need for large-scale data sharing between autonomous and possibly heterogeneous decentralized systems on the Web gave rise to the concept of P2P database systems. Decentralized databases are, however, not new. Whereas a definition for a P2P database system can be readily provided, a comparison with the more established decentralized models, commonly referred to as distributed, federated and multidatabases, is more likely to provide a better insight to this new P2P data management technology. Thus, in the paper, by distinguishing between db-centric and P2P-centric features, we examine features common to these database systems as well as other ad-hoc features that solely characterize P2P databases. We also provide a non-exhaustive taxonomy of the most prominent research efforts toward the realization of full-fledged P2P databases.
Abstract. Shared ontologies describe concepts and relationships to resolve semantic conflicts amongst users accessing multiple autonomous and heterogeneous information sources. We contend that while ontologies are useful in semantic reconciliation, they do not guarantee correct classification of semantic conflicts, nor do they provide the capability to handle evolving semantics or a mechanism to support a dynamic reconciliation process. Their limitations are illustrated through a conceptual analysis of several prominent examples used in heterogeneous database systems and in natural language processing. We view semantic reconciliation as a nonmonotonic query-dependent process that requires flexible interpretation of query context, and as a mechanism to coordinate knowledge elicitation while constructing the query context. We propose a system that is based on these characteristics, namely the SCOPES (Semantic Coordinator Over Parallel Exploration Spaces) system. SCOPES takes advantage of ontologies to constrain exploration of a remote database during the incremental discovery and refinement of the context within which a query can be answered. It uses an Assumption-based Truth Maintenance System (ATMS) to manage the multiple plausible contexts which coexist while the semantic reconciliation process is unfolding, and the Dempster-Shafer (DS) theory of belief to model the likelihood of these plausible contexts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.