Proceedings of the Workshop on Human-in-the-Loop Data Analytics 2016
DOI: 10.1145/2939502.2939511
|View full text |Cite
|
Sign up to set email alerts
|

Towards reliable interactive data cleaning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
43
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 55 publications
(43 citation statements)
references
References 23 publications
0
43
0
Order By: Relevance
“…; additionally, each of these tasks encapsulates several specialized algorithms such as machine learning, clustering or rule based procedures. In this research, it is proposed the idea of making an easy and fully automated data cleaning process (Krishnan, Haas, Franklin and Wu, 2016).…”
Section: State Of the Artmentioning
confidence: 99%
See 1 more Smart Citation
“…; additionally, each of these tasks encapsulates several specialized algorithms such as machine learning, clustering or rule based procedures. In this research, it is proposed the idea of making an easy and fully automated data cleaning process (Krishnan, Haas, Franklin and Wu, 2016).…”
Section: State Of the Artmentioning
confidence: 99%
“…For instance, Scalability since the code lines for data cleaning must be created in house according the company's requirements, automatic Recoverability due to the need of the human intervention to restore the data cleaning script. Hence, we need to avoid the human intervention in the process (Krishnan et al, 2016).…”
Section: Intuitive Proposal For Data Cleaningmentioning
confidence: 99%
“…This is referred to as editable shared representations between computers and humans [ 26 ]. Examples include natural language interfaces and form-based input [ 27 ]. Finally, domain experts are highly trained individuals, which allows systems to accelerate their input by using domain-specific assumptions and ontologies [ 28 , 29 ].…”
Section: Introductionmentioning
confidence: 99%
“…In contrast, tools at earlier pipeline stages have been designed mainly for data scientists and not for experts. However, domain experts are involved at every stage of the pipeline [ 27 - 31 ], especially in clinical research settings where data sets contain specialized information. Thus, there is a need to amplify domain expertise throughout the pipeline.…”
Section: Introductionmentioning
confidence: 99%
“…
Introduction
MotivationSampling-based approaches have been adopted to alleviate the burden of big data volume not only when approximate results are useful as exact ones [1][2][3][4][5], but also when the results from a small clean sample can be more accurate than those from the entire dirty data [6][7][8][9]. It is a common practice to iteratively generate small random samples of a big data set to explore the statistical properties of the entire data and define cleaning rules [10][11][12][13][14][15][16][17][18][19]. This iterative process becomes impractical or impossible on small computing clusters due to the communication, I/O and memory costs of cluster computing frameworks that implement a shared-nothing architecture [20][21][22].
…”
mentioning
confidence: 99%