Typical approaches for querying structured Web Data collect (crawl) and pre-process (index) large amounts of data in a central data repository before allowing for query answering. However, this time-consuming pre-processing phase however leverages the benefits of Linked Data -where structured data is accessible live and up-to-date at distributed Web resources that may change constantly -only to a limited degree, as query results can never be current. An ideal query answering system for Linked Data should return current answers in a reasonable amount of time, even on corpora as large as the Web. Query processors evaluating queries directly on the live sources require knowledge of the contents of data sources. In this paper, we develop and evaluate an approximate index structure summarising graph-structured content of sources adhering to Linked Data principles, provide an algorithm for answering conjunctive queries over Linked Data on the Web exploiting the source summary, and evaluate the system using synthetically generated queries. The experimental results show that our lightweight index structure enables complete and up-to-date query results over Linked Data, while keeping the overhead for querying low and providing a satisfying source ranking at no additional cost.
No abstract
Peer Data Management Systems (PDMS) are a natural extension of heterogeneous database systems. One of the main tasks in such systems is efficient query processing. Insisting on complete answers, however, leads to asking almost every peer in the network. Relaxing these completeness requirements by applying approximate query answering techniques can significantly reduce costs. Since most users are not interested in the exact answers to their queries, rank-aware query operators like top-k or skyline play an important role in query processing. In this paper, we present the novel concept of relaxed skylines that combines the advantages of both rank-aware query operators and approximate query processing techniques. Furthermore, we propose a strategy for processing relaxed skylines in distributed environments that allows for giving guarantees for the completeness of the result using distributed data summaries as routing indexes.
No abstract
Dealing with discrepancies in data is still a big challenge in data integration systems. The problem occurs both during eliminating duplicates from semantic overlapping sources as well as during combining complementary data from different sources. Though using SQL operations like grouping and join seems to be a viable way, they fail if the attribute values of the potential duplicates or related tuples are not equal but only similar by certain criteria. As a solution to this problem, we present in this paper similarity-based variants of grouping and join operators. The extended grouping operator produces groups of similar tuples, the extended join combines tuples satisfying a given similarity condition. We describe the semantics of this operator, discuss efficient implementations for the edit distance similarity and present evaluation results. Finally, we give examples of application from the context of a data reconciliation project for looted art.
No abstract
Essential information is often conveyed in illustrations in biomedical publications. A clinician's decision to access the full text when searching for evidence in support of clinical decision is frequently based solely on a short bibliographic reference. We seek to automatically augment these references with images from the article that may assist in finding evidence.The feasibility of automatically classifying images by usefulness (utility) in finding evidence was explored using supervised machine learning. We selected 2004 --2005 issues of the British Journal of Oral and Maxillofacial Surgery, manually annotating 743 images by utility and modality (radiological, photo, etc.) Image data, figure captions, and paragraphs surrounding figure discussions in text were used in classification.Automatic image classification achieved 84.3% accuracy using image captions for modality and 76.6% accuracy combining captions and image data for utility.Our results indicate that automatic augmentation of bibliographic references with relevant images is feasible.
In recent years the support for index tuning as part of physical database design has gained focus in research and product development, which resulted in index and design advisors. Nevertheless, these tools provide a one-off solution for a continuous task and are not deeply integrated with the DBMS functionality by only applying the query optimizer for index recommendation and profit estimation and decoupling the decision about and execution of index configuration changes from the core system functionality. In this paper we propose an approach that continuously collects statistics for recommended indexes and based on this, repetitively solves the Index Selection Problem (ISP). A key novelty is the on-the-fly index generation during query processing implemented by new query plan operators IndexBuildScan and SwitchPlan. Finally, we present the implementation and evaluation of the introduced concepts as part of the PostgreSQL system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.