Learning to create data-integrating queries

Talukdar, Partha; Jacob, Marie; Mehmood, Muhammad Salman; Crammer, Koby; Pereira, Fernando; Guha, Sudipto

doi:10.14778/1453856.1453941

Cited by 55 publications

(90 citation statements)

References 39 publications

(43 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this paper, we described an approach in which we assumed that the integration schema together with a set of candidate schema mappings are given, and showed how those mappings can be annotated and refined over time to identify the mappings that meet user requirements. While the techniques we presented in this paper are in the spirit of dataspaces, they represent a point in the broad space of dataspace solutions, which includes aspects such as bootstrapping dataspaces to provide users with services from the start [49], providing users with the means for querying heterogeneous data sources in the absence of schema mappings, using e.g., keyword search [52,51], indexing dataspaces [17], profiling dataspaces in the absence of schema information [30]. In this section, we analyze and compare these proposals to ours.…”

Section: Dataspacesmentioning

confidence: 99%

Incrementally improving dataspaces based on user feedback

Belhajjame

Paton

Embury

et al. 2013

Information Systems

View full text Add to dashboard Cite

One aspect of the vision of dataspaces has been articulated as providing various benefits of classical data integration with reduced up-front costs. In this paper, we present techniques that aim to support schema mapping specification through interaction with end users in a pay-as-you-go fashion. In particular, we show how schema mappings, that are obtained automatically using existing matching and mapping generation techniques, can be annotated with metrics estimating their fitness to user requirements using feedback on query results obtained from end users.Using the annotations computed on the basis of user feedback, and given user requirements in terms of precision and recall, we present a method for selecting the set of mappings that produce results meeting the stated requirements. In doing so, we cast mapping selection as an optimization problem. Feedback may reveal that the quality of schema mappings is poor. We show how mapping annotations can be used to support the derivation of better quality mappings from existing mappings through refinement. An evolutionary algorithm is used to efficiently and effectively explore the large space of mappings that can be obtained through refinement.User feedback can also be used to annotate the results of the queries that the user poses against an integration schema. We show how estimates for precision and recall can be computed for such queries. We also investigate the problem of propagating feedback about the results of (integration) queries down to the mappings used to populate the base relations in the integration schema.

show abstract

Section: Dataspacesmentioning

confidence: 99%

Incrementally improving dataspaces based on user feedback

Belhajjame

Paton

Embury

et al. 2013

Information Systems

View full text Add to dashboard Cite

show abstract

“…Matches are chosen based on their utility with respect to a query workload that is provided in advance. ORCHESTRA [62,36], a collaborative data sharing system, covering the three phases initialization, usage and improvement, uses a generic graph structure to store the schemas and matches between schema elements, which are derived semi-automatically and annotated with costs representing the bias of the system against using the matches. Mappings in the form of query templates are derived from keyword queries posed by the user and matched against the schemas and matches.…”

Section: Dataspace Management Systemsmentioning

confidence: 99%

“…u := gather(r) This operation is construed as providing the means by which a set u of feedback instances can be gathered. Incremental improvement based on user feedback can take a variety of forms: through the manual provision of mappings (e.g., [64]); through the annotation of query results as to which items are spurious or which should be ranked higher (e.g., [62]); through a more intensively interactive approach requiring a fair amount of user input during the integration process (e.g., [37]), or through a process by which mappings are debugged (e.g., [50]). We observe that all these approaches require, to different degrees, an understanding of the syntax and semantics of mapping and schema languages on the part of the person providing the feedback.…”

Section: Dataspace-specific Operationsmentioning

confidence: 99%

“…In our own work, we have focussed on query evaluation, but there are many other ways in which a dataspace could be used, e.g., browsing [35], keyword searching [14,43,46,62], or interaction based on the notion of trails [18,64]. Improvement: This stage is characteristic of dataspaces and aims to counteract the shortcomings ensuing from the reliance on automation for bootstrapping (the other characteristic feature of DSMSs in this context).…”

Section: Introductionmentioning

confidence: 99%

“…In our own work [6], we have used feedback on query results to annotate, select and refine the collection of mappings that can be used to answer a given query by taking into account the possibilities for trading off precision and recall. It is possible and useful to gather other kinds of feedback, e.g., on queries [15]; on mappings [1]; or for query specification [61,62]. Maintenance: Changes to the sources must be propagated throughout.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Functional Model for Dataspace Management Systems

Hedeler

Fernandes

Belhajjame

et al. 2013

Advanced Query Processing

View full text Add to dashboard Cite

Dataspace management systems (DSMSs) hold the promise of pay-asyou-go data integration. We describe a comprehensive model of DSMS functionality using an algebraic style. We begin by characterizing a dataspace life cycle and highlighting opportunities for both automation and user-driven improvement techniques. Building on the observation that many of the techniques developed in model management are of use in data integration contexts as well, we briefly introduce the model management area and explain how previous work on both data integration and model management needs extending if the full dataspace life cycle is to be supported. We show that many model management operators already enable important functionality (e.g., the merging of schemas, the composition of mappings, etc.) and formulate these capabilities in an algebraic structure, thereby giving rise to the notion of the core functionality of a DSMS as a many-sorted algebra. Given this view, we show how core tasks in the dataspace life cycle can be enacted by means of algebraic programs. An extended case study illustrates how such algebraic programs capture a challenging, practical scenario.

show abstract

BioBrowsing: Making the Most of the Data Available in Entrez

Cohen-Boulakia

Masini

2009

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Learning to create data-integrating queries

Cited by 55 publications

References 39 publications

Incrementally improving dataspaces based on user feedback

Incrementally improving dataspaces based on user feedback

A Functional Model for Dataspace Management Systems

BioBrowsing: Making the Most of the Data Available in Entrez

Contact Info

Product

Resources

About