2014 IEEE International Conference on Big Data (Big Data) 2014
DOI: 10.1109/bigdata.2014.7004207
|View full text |Cite
|
Sign up to set email alerts
|

BayesWipe: A multimodal system for data cleaning and consistent query answering on structured bigdata

Abstract: Recent efforts in data cleaning of structured data have focused exclusively on problems like data deduplication, record matching, and data standardization; none of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which have to be provided by domain experts, or learned from a clean sample of the database). In this paper, we provide a method for correcting individual attribu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 24 publications
(21 reference statements)
0
5
0
Order By: Relevance
“…will evaluate techniques such as [29] that overcome the same problem by issuing a single but more complex query. Another aspect to consider is the introduction of online repairing of AFDs [36,37,38]: the idea is to automatically repair the errors in AFDs at query time, so that the user can directly receive the correct results without the need to modify the original data. Another direction is to broaden the scope of the approach to a scenario with multiple collections, thus extending the support to the whole DOD.…”
Section: Discussionmentioning
confidence: 99%
“…will evaluate techniques such as [29] that overcome the same problem by issuing a single but more complex query. Another aspect to consider is the introduction of online repairing of AFDs [36,37,38]: the idea is to automatically repair the errors in AFDs at query time, so that the user can directly receive the correct results without the need to modify the original data. Another direction is to broaden the scope of the approach to a scenario with multiple collections, thus extending the support to the whole DOD.…”
Section: Discussionmentioning
confidence: 99%
“…When the external downloaded data was imported to the local database, some data had conversion errors (Goh VT & Siddiqi, 2008). For example, when ''2019-01-01'' in the text file was converted to the data of Excel or SQL server, it was considered as digital data, so it may become 43,466 (days from 1900-01-01 to 2019-01-01) (De et al, 2014). For the data involving dates, this article designed a comparison data table, which could execute the update command to realize batch modification.…”
Section: Data Cleaning and Finishingmentioning
confidence: 99%
“…Data cleaning has to be done which includes detection and removal of errors,inconsistencies to improve data quality [4].Certain methods include null value treatment, filling missing values, detection and treatment of outliers in data, data duplication [5].…”
Section: Data Preprocessingmentioning
confidence: 99%