The Web contains a vast amount of structured information such as HTML tables, HTML lists and deep-web databases; there is enormous potential in combining and re-purposing this data in creative ways. However, integrating data from this relational web raises several challenges that are not addressed by current data integration systems or mash-up tools. First, the structured data is usually not published cleanly and must be extracted (say, from an HTML list) before it can be used. Second, due to the vastness of the corpus, a user can never know all of the potentially-relevant databases ahead of time (much less write a wrapper or mapping for each one); the source databases must be discovered during the integration process. Third, some of the important information regarding the data is only present in its enclosing web page and needs to be extracted appropriately.This paper describes Octopus, a system that combines search, extraction, data cleaning and integration, and enables users to create new data sets from those found on the Web. The key idea underlying Octopus is to offer the user a set of best-effort operators that automate the most labor-intensive tasks. For example, the Search operator takes a search-style keyword query and returns a set of relevance-ranked and similarity-clustered structured data sources on the Web; the Context operator helps the user specify the semantics of the sources by inferring attribute values that may not appear in the source itself, and the Extend operator helps the user find related sources that can be joined to add new attributes to a table. Octopus executes some of these operators automatically, but always allows the user to provide feedback and correct errors. We describe the algorithms underlying each of these operators and experiments that demonstrate their efficacy.
R adio frequency identification technology has become popular as an effective, low-cost solution for tagging and wireless identification. Although early RFID deployments focused primarily on industrial settings, successes have led to a boom in more personal, pervasive applications such as reminders 1 and eldercare. 2 RFID promises to enhance many everyday activities but also raises great challenges-in particular, with respect to security and privacy.At the University of Washington, we've deployed the RFID Ecosystem, a pervasive computing system based on a building-wide RFID infrastructure with 80 RFID readers, 300 antennas, tens of tagged people, and thousands of tagged objects. 3 The RFID Ecosystem is a capture-and-access system that streams all data from the readers into a central database, where applications can access it. Our goal is to provide a laboratory for longterm research in security and privacy, as well as applications, data management, and systems issues for RFID-based, community-oriented pervasive computing.RFID security is a vibrant research area, with many protection mechanisms against unauthorized RFID cloning and reading attacks emerging. 4 However, little work has yet addressed the complementary issue of protecting the privacy of RFID data after an authorized system has captured and stored it. We've investigated peer-topeer privacy for personal RFID data through an access-control policy called Physical Access Control. PAC protects privacy by constraining the data a user can obtain from the system to those events that occurred when and where that user was physically present. While strictly limiting information disclosure, PAC also affords a database view that augments users' memory of places, objects, and people. PAC is appropriate as a default level of access control because it models the physical boundaries in everyday life. Here, we focus on the privacy, utility, and security issues raised by its implementation in the RFID Ecosystem. Privacy and utility in pervasive architecturesThe 18th-century legal philosopher Jeremy Bentham first described the perfect architecture for surveillance: the panopticon, a prison that arranges its cells about a central tower from which a guard can monitor every cell while remaining invisible to the inmates. The architecture's innovation is that the guard's presence becomes unnecessary except for occasional public demonstrations of power. Many privacy concerns in pervasive computing stem from a similar potential for an unseen observer to access and act on data about someone else. Under these conditions, the "state of conscious and permanent visibility [assures] the automatic functioning of power" 5 because individuals must constantly conform to the code of conduct their peers or superiors hold them to.Just as surveillance can be built into an architecture, so can privacy assurances. Our fundaTo protect the privacy of RFID data after an authorized system captures it, this policy-based approach constrains the data users can access to system events that occurred whe...
Mobile and pervasive applications frequently rely on devices such as RFID antennas or sensors (light, temperature, motion) to provide them information about the physical world. These devices, however, are unreliable. They produce streams of information where portions of data may be missing, duplicated, or erroneous. Current state of the art is to correct errors locally (e.g., range constraints for temperature readings) or use spatial/temporal correlations (e.g., smoothing temperature readings). However, errors are often apparent only in a global setting, e.g., missed readings of objects that are known to be present, or exit readings from a parking garage without matching entry readings.In this paper, we present StreamClean, a system for correcting input data errors automatically using application defined global integrity constraints. Because it is frequently impossible to make corrections with certainty, we propose a probabilistic approach, where the system assigns to each input tuple the probability that it is correct.We show that StreamClean handles a large class of input data errors, and corrects them sufficiently fast to keep-up with input rates of many mobile and pervasive applications. We also show that the probabilities assigned by StreamClean correspond to a user's intuitive notion of correctness.
No abstract
Motivated by eScience applications, we explore automatic generation of example "starter" queries over unstructured collections of tables without relying on a schema, a query log, or prior input from users. Such example queries are demonstrably sufficient to have non-experts self-train and become productive using SQL, helping to increase the uptake of database technology among scientists.Our method is to learn a model for each relational operator based on example queries from public databases, then assemble queries syntactically operator-by-operator. For example, the likelihood that a pair of attributes will be used as a join condition in an example query depends on the cardinality of their intersection, among other features. Our demonstration illustrates that datasets with different statistical properties lead to different sets of example queries with different properties.
The Cascadia system provides RFID-based pervasive computing applications with an infrastructure for specifying, extracting and managing meaningful high-level events from raw RFID data. Cascadia allows users to specify events of interest using a graphical interface with an intuitive visual language. Cascadia also effectively extracts these events from data in spite of the unreliability of RFID technology and the inherent ambiguity in event extraction.We demonstrate Cascadia's technique through a digital diary application in the form of a calendar. Cascadia automatically populates the calendar with meaningful events for the user. We use data collected in a building-wide RFID deployment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.