Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data 2005
DOI: 10.1145/1066157.1066287
|View full text |Cite
|
Sign up to set email alerts
|

Data cleaning in microsoft SQL server 2005

Abstract: When collecting and combining data from various sources into a data warehouse, ensuring high data quality and consistency becomes a significant, often expensive, challenge. Common data quality problems include inconsistent data conventions amongst sources such as different abbreviations or synonyms; data entry errors such as spelling mistakes; missing, incomplete, outdated or otherwise incorrect attribute values. These data defects generally manifest themselves as foreign-key mismatches and approximately dupli… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
0

Year Published

2007
2007
2013
2013

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(20 citation statements)
references
References 2 publications
(1 reference statement)
0
20
0
Order By: Relevance
“…, r m }, such that each object is represented by one or more descriptions. Entity Resolution has two main instances: Lookup [2], [5] and Grouping [2], [9].…”
Section: A Er Problem Definitionmentioning
confidence: 99%
See 1 more Smart Citation
“…, r m }, such that each object is represented by one or more descriptions. Entity Resolution has two main instances: Lookup [2], [5] and Grouping [2], [9].…”
Section: A Er Problem Definitionmentioning
confidence: 99%
“…Our approach is proved to be efficient even with low quality data, because contextual features are more robust in presence of video data with poor quality. To exploit contextual information for PI, we connect PI problem with a well-studied entity resolution problem [2], [5], [9], which is typically considered in the context of textual data. Entity resolution is a very active research area where many generic approaches have been proposed, many of which could potentially be applied to the PI problem.…”
mentioning
confidence: 99%
“…For example, entity resolution is known as fuzzy grouping operation in the data-integration module of Microsoft SQL Server DBMS [10]. Having a less analystdependent technique makes that operation of wide applicability, so that non-expert users can apply it to their datasets.…”
Section: Motivation For Adaptivenessmentioning
confidence: 99%
“…This is a more general disambiguation challenge known as entity resolution. It is also known as fuzzy grouping [10] and object consolidation [2]. In the generic settings of this disambiguation problem, a dataset D describes a set of entities E = {e1, e2, .…”
Section: Introductionmentioning
confidence: 99%
“…Notice, since FBS does not use any relationships, including the random noise, its curve stays flat as well. minimizes the analyst participation, which is important since nowadays various data-integration solutions are incorporated in real Database Management Systems (DBMS), such as Microsoft SQL Server DBMS [6]. Having a less analystdependent technique makes that operation of wide applicability, so that nonexpert users can apply it to their datasets.…”
Section: Experiments On the Publications Domainmentioning
confidence: 99%