2018
DOI: 10.3390/sym10040099
|View full text |Cite
|
Sign up to set email alerts
|

How to Address the Data Quality Issues in Regression Models: A Guided Process for Data Cleaning

Abstract: Today, data availability has gone from scarce to superabundant. Technologies like IoT, trends in social media and the capabilities of smart-phones are producing and digitizing lots of data that was previously unavailable. This massive increase of data creates opportunities to gain new business models, but also demands new techniques and methods of data quality in knowledge discovery, especially when the data comes from different sources (e.g., sensors, social networks, cameras, etc.). The data quality process … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
4

Relationship

2
8

Authors

Journals

citations
Cited by 32 publications
(27 citation statements)
references
References 71 publications
0
11
0
Order By: Relevance
“…Data acquired in Section 2.1 was prepared and cleaned in order to obtain an appropriate dataset for understanding the fluctuation of the avocado market in the United States, based on different weather conditions. For this purpose, we followed the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology [14] and the data cleaning process proposed in [15]. Figure 1 exposes the data preparation tasks that were considered.…”
Section: Data Selection and Cleaningmentioning
confidence: 99%
“…Data acquired in Section 2.1 was prepared and cleaned in order to obtain an appropriate dataset for understanding the fluctuation of the avocado market in the United States, based on different weather conditions. For this purpose, we followed the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology [14] and the data cleaning process proposed in [15]. Figure 1 exposes the data preparation tasks that were considered.…”
Section: Data Selection and Cleaningmentioning
confidence: 99%
“…• Build an integrated data quality framework for several knowledge discovery tasks as regression [37], clustering and association rules. The integrated data quality framework would consider the Big Data paradigm [29] and hence huge datasets.…”
Section: Discussionmentioning
confidence: 99%
“…Data quality issues generally appear when the quality requirements are not met on the data values [41]. These issues are due to several factors or processes having occurred at different levels: In [21,35,42], many causes of poor data quality were enumerated, and a list of elements, which affect the quality and DQD's was produced. This list is illustrated in Table 4.…”
Section: Data Quality Issuesmentioning
confidence: 99%