2020
DOI: 10.3390/ijgi9040227
|View full text |Cite
|
Sign up to set email alerts
|

Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF Method

Abstract: Accurate house price forecasts are very important for formulating national economic policies. In this paper, we offer an effective method to predict houses’ sale prices. Our algorithm includes one-hot encoding to convert text data into numeric data, feature correlation to select only the most correlated variables, and a technique to overcome the missing data. Our approach is an effective way to handle missing data in large datasets with the K-nearest neighbor algorithm based on the most correlated features (KN… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 31 publications
(17 citation statements)
references
References 23 publications
(30 reference statements)
0
17
0
Order By: Relevance
“…The technique can aid the description, generalization and categorization of a given set of data by breaking the dataset into smaller subsets while incrementally developing an associated decision tree with decision nodes and leaf nodes. We used Grid search to get the best set of hyperparameters for the model, we tested different values for the min sample split s= [5,10,15,20] and s=10 was found to be the best for the model with max depth of 3. A 10-fold cross validation was used to estimate the performance of the model.…”
Section: Decision Tree (Dt)mentioning
confidence: 99%
See 1 more Smart Citation
“…The technique can aid the description, generalization and categorization of a given set of data by breaking the dataset into smaller subsets while incrementally developing an associated decision tree with decision nodes and leaf nodes. We used Grid search to get the best set of hyperparameters for the model, we tested different values for the min sample split s= [5,10,15,20] and s=10 was found to be the best for the model with max depth of 3. A 10-fold cross validation was used to estimate the performance of the model.…”
Section: Decision Tree (Dt)mentioning
confidence: 99%
“…LCS, however, are prone to various failures including bias, drifts, precision degradation, and loss of considerable amount of data due to operational issues [2]. Missing data is a pervasive issue which occur in most real-world datasets including medical records [3,4], geo-informatics [5], traffic flow [6] and industrial applications [7,8]. The European Union Data Quality Directive (EU-DQD) [9] defined the data quality objective (DQO) that a monitoring method needs to comply with to be used as indicative measurement for regulative purposes.…”
Section: Introductionmentioning
confidence: 99%
“…It is most useful when there are only a few hyperparameters to optimize but would usually be outperformed by other weighted-random search algorithms when the model grows in complexity. We tested different values for the min sample split s= [5,10,15,20] and s=10 was found to be the best for the model with max depth of 3. A 10-fold cross validation was used to estimate the performance of the model.…”
Section: B Decision Tree (Dt)mentioning
confidence: 99%
“…LCS, however, are prone to diverse issues including bias, drifts, precision degradation, and loss of considerable amount of data due to operational issues [2]. Missing data is a pervasive issue, affecting most real-world datasets including medical records [3], [4], geo-informatics [5], traffic flow [6] and industrial applications [7], [8]. The European Union Data Quality Directive (EU-DQD) de-fined the data quality objective (DQO) that a monitoring method needs to comply with to be used as indicative measurement for regulative purposes [9].…”
Section: Introductionmentioning
confidence: 99%
“…Later, the methods of mathematical statistics were introduced to simplify the behavior prediction into a two-category problem [ 16 , 17 ]. KNN is one of the simplest classification methods, which is widely used in vehicle sales forecast [ 18 ], health monitoring [ 19 ], housing price forecast [ 20 ], and other fields. Similar to KNN, SVM is also a popular classification method, which is based on the structural risk minimization (SRM) principle of statistical learning theory and has excellent generalization performance [ 21 23 ].…”
Section: Introductionmentioning
confidence: 99%