Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining 2020
DOI: 10.1145/3394486.3406477
|View full text |Cite
|
Sign up to set email alerts
|

Overview and Importance of Data Quality for Machine Learning Tasks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
39
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 129 publications
(60 citation statements)
references
References 8 publications
2
39
0
Order By: Relevance
“…Such good results are due mainly to the good quality of entry data that was manually created and configured by expert linguists. Our methodology confirms the importance of data quality over quantity for ML applications [70].…”
Section: Input Processingsupporting
confidence: 71%
“…Such good results are due mainly to the good quality of entry data that was manually created and configured by expert linguists. Our methodology confirms the importance of data quality over quantity for ML applications [70].…”
Section: Input Processingsupporting
confidence: 71%
“…[5]. For example, it may include details of how a dataset fares across certain pre-defined quality metrics known to influence model building efforts [12]. The specific format of visualizing this information may vary depending on output requirements and constraints.…”
Section: Baseline Data Quality and Readiness Analysismentioning
confidence: 99%
“…These mainly contain the key sections covered in the data readiness report template described in Figure 3. We made use of some of the machine learning related quality metrics mentioned in [12] for illustration. Due to space constraints, we include only limited features just to exemplify how key information from the quality analysis process can be represented in the report.…”
Section: Referencesmentioning
confidence: 99%

Data Readiness Report

Afzal,
C,
Kesarwani
et al. 2020
Preprint
Self Cite
“…While other topics are actively researched in the ML literature, such as the improvement of data quality [6][7][8] or the development of better-performing models [9][10][11], the metrics used to evaluate these predictive pipelines have taken a relatively minimal place in this field. Caruana and Niculescu-Mizil [12] provided one of the earliest comprehensive works on the topic, presenting nine performance metrics for binary classification, which they divided into three groups: threshold metrics, ordering/rank metrics, and probability metrics.…”
Section: Introductionmentioning
confidence: 99%