2016 International Conference on Information Technology Systems and Innovation (ICITSI) 2016
DOI: 10.1109/icitsi.2016.7858197
|View full text |Cite
|
Sign up to set email alerts
|

Data profiling for data quality improvement with OpenRefine

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
8
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 27 publications
(8 citation statements)
references
References 2 publications
0
8
0
Order By: Relevance
“…OpenRefine 11 (formerly Google Refine, abbrev. OR) is a free and open-source DQ tool dedicated to data cleansing and data transformation and was discovered through (Kusumasari et al, 2016 ) in the IEEE search results, and (Tsiflidou and Manouselis, 2013 ) in the Springer Link search results as well as on GitHub 12 . While the original functionality of the tools does not primarily align with the focus of our survey, its extension MetricDoc specifically aims at assessing DQ with “customizable, reusable quality metrics in combination with immediate visual feedback” (Bors et al, 2018 ).…”
Section: Data Quality Tool Evaluationmentioning
confidence: 99%
See 1 more Smart Citation
“…OpenRefine 11 (formerly Google Refine, abbrev. OR) is a free and open-source DQ tool dedicated to data cleansing and data transformation and was discovered through (Kusumasari et al, 2016 ) in the IEEE search results, and (Tsiflidou and Manouselis, 2013 ) in the Springer Link search results as well as on GitHub 12 . While the original functionality of the tools does not primarily align with the focus of our survey, its extension MetricDoc specifically aims at assessing DQ with “customizable, reusable quality metrics in combination with immediate visual feedback” (Bors et al, 2018 ).…”
Section: Data Quality Tool Evaluationmentioning
confidence: 99%
“…While the original functionality of the tools does not primarily align with the focus of our survey, its extension MetricDoc specifically aims at assessing DQ with “customizable, reusable quality metrics in combination with immediate visual feedback” (Bors et al, 2018 ). Apart from the mention by Tsiflidou and Manouselis ( 2013 ) and Kusumasari et al ( 2016 ), OpenRefine was not evaluated in one of the previous DQ tool surveys, although it is open source. We installed the tool from GitHub and evaluated OpenRefine version 3.0 with the MetricDoc extension (where no version was provided), downloaded on February 14th, 2019.…”
Section: Data Quality Tool Evaluationmentioning
confidence: 99%
“…In addition, the company's receivables can continue to grow because inactive customer accounts can continue to be recorded in arrears. Thus, data cleansing is needed to overcome these problems by eliminating duplicate data, noise, inaccuracy, and data discrepancies so that companies can make proper decision making [11], [12], [13], [14].…”
Section: Introductionmentioning
confidence: 99%
“…The result of the research shows that the Permit Number has 70 patterns on 5000 rows of data. Duplication analysis needs to be combined with other elements because one production with a single license number can be duplicated if the factory location, volume and weight of the package are different [3]. [3] There is also previous research by Febri on profiling clustering data by implementing fingerprint algorithm using BPOM dataset and tested by comparative test with result of every algorithm implemented in each application have difference.…”
Section: Introductionmentioning
confidence: 99%
“…Duplication analysis needs to be combined with other elements because one production with a single license number can be duplicated if the factory location, volume and weight of the package are different [3]. [3] There is also previous research by Febri on profiling clustering data by implementing fingerprint algorithm using BPOM dataset and tested by comparative test with result of every algorithm implemented in each application have difference. The comparative results are that Pentaho found 602 lines from 4482 lines, Talend Open Studio found 502 lines from 4482 lines and Google OpenRefine found 562 lines from 4482 lines.…”
Section: Introductionmentioning
confidence: 99%