2021
DOI: 10.54623/fue.fcij.6.1.3
|View full text |Cite
|
Sign up to set email alerts
|

Data Quality Dimensions, Metrics, and Improvement Techniques

Abstract: Achieving high level of data quality is considered one of the most important assets for any small, medium and large size organizations. Data quality is the main hype for both practitioners and researchers who deal with traditional or big data. The level of data quality is measured through several quality dimensions. High percentage of the current studies focus on assessing and applying data quality on traditional data. As we are in the era of big data, the attention should be paid to the tremendous volume of g… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…Rehm and Goel, 2015 Storey and Kelly (2002 Singh and Masuku, 2014 ) . (Gabr, 2021;Pipino et al, 2002) G _ r Z ? (Evergreen and Emery, 2016…”
Section: Sergeeva Andmentioning
confidence: 99%
“…Rehm and Goel, 2015 Storey and Kelly (2002 Singh and Masuku, 2014 ) . (Gabr, 2021;Pipino et al, 2002) G _ r Z ? (Evergreen and Emery, 2016…”
Section: Sergeeva Andmentioning
confidence: 99%
“…Evaluating the dataset quality is challenging and must be done prior to any data modeling. While there are metrics that evaluate some important properties of a dataset (accuracy, completeness, consistency, timeliness, and others), these metrics often overlap [16]. Also, these metrics are more focused on the quality of data, and there is a lack of complete and proven methodologies for assessing the quality of datasets from a general perspective [30].…”
Section: Related Workmentioning
confidence: 99%
“…Many data quality dimensions have been addressed through the literature [1]- [5], Data Duplication has been considered as one of the most intriguing dimensions. Data duplication is defined as multiple representation of the same real world object or a measure of undesirable duplicates within a certain field, record or dataset [6]. The duplication can be found in two different types, the Deterministic and the Probabilistic duplications [7], [8].…”
Section: Introductionmentioning
confidence: 99%