2020
DOI: 10.3390/data5020050
|View full text |Cite
|
Sign up to set email alerts
|

Data Wrangling in Database Systems: Purging of Dirty Data

Abstract: Researchers need to be able to integrate ever-increasing amounts of data into their institutional databases, regardless of the source, format, or size of the data. It is then necessary to use the increasing diversity of data to derive greater value from data for their organization. The processing of electronic data plays a central role in modern society. Data constitute a fundamental part of operational processes in companies and scientific organizations. In addition, they form the basis for decisions. Bad dat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 22 publications
(10 citation statements)
references
References 12 publications
(13 reference statements)
0
10
0
Order By: Relevance
“…This results in a "tall" format with potentially many rows for each item. Dealing with dirty or ill-defined data introduces additional challenges of cleaning (making data types consistent, ensuring appropriate types), validation (checking for bad data) and removing or replacing anomalous values [5,48]. This may require decisions about densification or imputation [46,52] or about what to ignore [48].…”
Section: Table Techniques In Visualization Researchmentioning
confidence: 99%
“…This results in a "tall" format with potentially many rows for each item. Dealing with dirty or ill-defined data introduces additional challenges of cleaning (making data types consistent, ensuring appropriate types), validation (checking for bad data) and removing or replacing anomalous values [5,48]. This may require decisions about densification or imputation [46,52] or about what to ignore [48].…”
Section: Table Techniques In Visualization Researchmentioning
confidence: 99%
“…In Computer Science, many papers have discussed methods for the data quality management in information systems from different domains, such as the first-time-right principle, the closed-loop principle, data catalogue, data profiling ( Azeroual et al, 2018b ), data cleansing ( Azeroual et al, 2018a ), data wrangling ( Azeroual, 2020 ), data monitoring, data lakes ( Mathis, 2017 ), data text mining ( Azeroual, 2019 ), and machine learning ( Duka and Hribar, 2010 ; Maali et al, 2010 ); these papers have also shown how the methods can be used in practice to ensure data quality. The methods of data cleaning and monitoring range from fully automated to mostly manual operations, which is closely related to the amount of knowledge required for each operation.…”
Section: How To Improve Data Qualitymentioning
confidence: 99%
“…A study has been done to gain insight into the Exploratory Data analysis techniques regarding cyber events. Data Analysis techniques were used in similar tasks such as for Unified Host and Network data set (Beazley et al , 2019), A Cyber Threat intelligence Perspective(Al-Mohannadi et al , 2020), Analysis of Cyber Defence Exercise using exploratory sequential analysis(Andersson et al , 2011), Intrusion Detection Technique based on proposed Statistical Flow Features for Protecting Network Traffic of Internet of Things (Moustafa et al , 2019).…”
Section: Related Workmentioning
confidence: 99%