2016
DOI: 10.3233/aic-160710
|View full text |Cite
|
Sign up to set email alerts
|

A survey on pre-processing techniques: Relevant issues in the context of environmental data mining

Abstract: One of the important issues related with all types of data analysis, either statistical data analysis, machine learning, data mining, data science or whatever form of data-driven modeling, is data quality. The more complex the reality to be analyzed is, the higher the risk of getting low quality data. Unfortunately real data often contain noise, uncertainty, errors, redundancies or even irrelevant information. Useless models will be obtained when built over incorrect or incomplete data. As a consequence, the q… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
33
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
3
1
1

Relationship

3
7

Authors

Journals

citations
Cited by 51 publications
(34 citation statements)
references
References 175 publications
(135 reference statements)
1
33
0
Order By: Relevance
“…Databases from each centre are harmonized into a single data base by applying the data-cleaning pre-processing techniques. Descriptive statistics and data visualisation methods are used in order to detect outliers, data errors, missing data and influential observations [12]. A double-checking process correcting errors and completing missing information is carried out to minimize incomplete and erroneous data.…”
Section: Methodsmentioning
confidence: 99%
“…Databases from each centre are harmonized into a single data base by applying the data-cleaning pre-processing techniques. Descriptive statistics and data visualisation methods are used in order to detect outliers, data errors, missing data and influential observations [12]. A double-checking process correcting errors and completing missing information is carried out to minimize incomplete and erroneous data.…”
Section: Methodsmentioning
confidence: 99%
“…Such iterative and explorative nature of the modeling process is commonly tedious and time-consuming. Moreover, the quality of the ML results is also dependent of data and feature engineering aspects (e.g., feature selection, outlier detection) (Domingos, 2012) that are typically performed on the Data Understanding and Data Preparation CRISP-DM stages (Gibert et al, 2016).…”
Section: Crisp-dm and Automlmentioning
confidence: 99%
“…The correctness of the predictions and their reservations developed by the ML algorithms depend on the data quality, model representativeness and the reliance between the input and target variables in the collected datasets [14,15]. Data with high level of noise, erroneous data, presence of outliers, biases and incomplete datasets may significantly reduce the predictive efficiency of the models [16,17]. To overcome the issues, this research paper designed a DCRN model to predict the crop yield by using rainfall parameter.…”
Section: Introductionmentioning
confidence: 99%