Michael Flaster scite author profile

Data integrated from multiple sources may contain inconsistencies that violate integrity constraints. The constraint repair problem attempts to find "low cost" changes that, when applied, will cause the constraints to be satisfied. While in most previous work repair cost is stated in terms of tuple insertions and deletions, we follow recent work to define a database repair as a set of value modifications. In this context, we introduce a novel cost framework that allows for the application of techniques from record-linkage to the search for good repairs. We prove that finding minimal-cost repairs in this model is NP-complete in the size of the database, and introduce an approach to heuristic repair-construction based on equivalence classes of attribute values. Following this approach, we define two greedy algorithms. While these simple algorithms take time cubic in the size of the database, we develop optimizations inspired by algorithms for duplicate-record detection that greatly improve scalability. We evaluate our framework and algorithms on synthetic and real data, and show that our proposed optimizations greatly improve performance at little or no cost in repair quality.

show abstract

Exploratory Analysis System for Semi-structured Engineering Logs

Flaster¹,

Hillyer²,

Ho³

2006

View full text Add to dashboard Cite

Abstract. Engineering diagnosis often involves analyzing complex records of system states printed to large, textual log files. Typically the logs are designed to accommodate the widest debugging needs without rigorous plans on formatting. As a result, critical quantities and flags are mixed with less important messages in a loose structure. Once the system is sealed, the log format is not changeable, causing great difficulties to the technicians who need to understand the event correlations. We describe a modular system for analyzing such logs where document analysis, report generation, and data exploration tools are factored into generic, reusable components and domain-dependent, isolated plug-ins. The system supports incremental, focused analysis of complicated symptoms with minimal programming effort and software installation. We discuss important concerns in the analysis of logs that sets it apart from understanding natural language text or rigorously structured computer programs. We highlight the research challenges that would guide the development of a deep analysis system for many kinds of semi-structured documents.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Michael Flaster

A cost-based model and effective heuristic for repairing constraints by value modification

Exploratory Analysis System for Semi-structured Engineering Logs

Contact Info

Product

Resources

About