2019
DOI: 10.1007/978-3-030-27615-7_17
|View full text |Cite
|
Sign up to set email alerts
|

A DaQL to Monitor Data Quality in Machine Learning Applications

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 28 publications
(17 citation statements)
references
References 6 publications
0
13
0
Order By: Relevance
“…As data model influences data warehouse quality, the study in [34] used statistical and machine learning methods to predict the effect of structural metrics on the efficiency and effectiveness of a conceptual model. Nonetheless, the study in [35] presented a tool for continuously monitoring the quality of data to increase the prediction accuracy of machine learning models.…”
Section: ) Existing Computing Methods For Scrutinizing Engaged Human Resourcesmentioning
confidence: 99%
“…As data model influences data warehouse quality, the study in [34] used statistical and machine learning methods to predict the effect of structural metrics on the efficiency and effectiveness of a conceptual model. Nonetheless, the study in [35] presented a tool for continuously monitoring the quality of data to increase the prediction accuracy of machine learning models.…”
Section: ) Existing Computing Methods For Scrutinizing Engaged Human Resourcesmentioning
confidence: 99%
“…The curation task can be implemented manually or automated. Data standardization, de-duplication, and matching are examples of automated tasks [19]. The focus of this research lies in automated data curation, within particular anomaly detection.…”
Section: E Data Curationmentioning
confidence: 99%
“…Models that are derived from it will perform poorly and generate unreliable conclusions. According to [19], the key to high-quality ML are the three principles of data quality: prevention, detection, and correction. Anomaly detection aims to detect abnormal patterns deviating from the rest of the data, called anomalies or outliers [20].…”
Section: E Data Curationmentioning
confidence: 99%
“…A technique for improving accuracy in the absence of trustworthy sources is pointed out by [35], where many cheap labels from noisy labelers are combined. Various data validation [36], [37], [38], [39], [40] frameworks have been developed to address challenges relating to completeness, consistency, and timeliness: Deequ, DuckDQ, DaQL, and TFX enable practitioners to encode their expectations about how data may look like. Automated data cleaning methods based on anomaly or novelty detection offer the option to remove inconsistent data tuples even without any expert knowledge [41], [42], [43].…”
Section: Datasetsmentioning
confidence: 99%