Missing Features Reconstruction and Its Impact on Classification Accuracy

Friedjungová, Magda; Jirina, Marcel; Vašata, Daniel

doi:10.1007/978-3-030-22744-9_16

Cited by 5 publications

(4 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Additionally, the impact of missing data and imputation methods on the analysis of activity patterns underscores the importance of accurate imputation techniques [32]. Furthermore, the use of artificial neural networks for missing feature reconstruction highlights the relevance of advanced techniques in imputation [33]. Moreover, neural models have been employed for the imputation of missing ozone data, demonstrating the applicability of machine learning in addressing missing data in various domains [34].…”

Section: Data Imputation Methodsmentioning

confidence: 99%

Imputation of missing microclimate data of coffee-pine agroforestry with machine learning

Nurwarsito,

Suprayogo,

Sakti

et al. 2024

Int. J. Adv. Intell. Informatics

View full text Add to dashboard Cite

This research presents a comprehensive analysis of various imputation methods for addressing missing microclimate data in the context of coffee-pine agroforestry land in UB Forest. Utilizing Big data and Machine learning methods, the research evaluates the effectiveness of imputation missing microclimate data with Interpolation, Shifted Interpolation, K-Nearest Neighbors (KNN), and Linear Regression methods across multiple time frames - 6 hours, daily, weekly, and monthly. The performance of these methods is meticulously assessed using four key evaluation metrics Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). The results indicate that Linear Regression consistently outperforms other methods across all time frames, demonstrating the lowest error rates in terms of MAE, MSE, RMSE, and MAPE. This finding underscores the robustness and precision of Linear Regression in handling the variability inherent in microclimate data within agroforestry systems. The research highlights the critical role of accurate data imputation in agroforestry research and points towards the potential of machine learning techniques in advancing environmental data analysis. The insights gained from this research contribute significantly to the field of environmental science, offering a reliable methodological approach for enhancing the accuracy of microclimate models in agroforestry, thereby facilitating informed decision-making for sustainable ecosystem management.

show abstract

Section: Data Imputation Methodsmentioning

confidence: 99%

Imputation of missing microclimate data of coffee-pine agroforestry with machine learning

Nurwarsito,

Suprayogo,

Sakti

et al. 2024

Int. J. Adv. Intell. Informatics

View full text Add to dashboard Cite

show abstract

“…The abnormal values are first removed from the data set, and then, linear interpolation is used for imputation of the missing data. Previous studies have demonstrated that using linear interpolation method to impute the data set with missing data less than 5% can achieve almost the same model accuracy as the complete data set. − The statistical descriptions of the data after cleaning are also shown in Tables S1 and S2. The data set of 27 months (4938 samples) is divided into 9 blocks on average, each of which has similar seasonal characteristics.…”

Section: Case Studymentioning

confidence: 99%

Mining Spatiotemporal Information for Harmful Algal Bloom Forecasting and Mechanism Interpreting

Jia,

Xu,

Jia

et al. 2024

ACS EST Water

View full text Add to dashboard Cite

A multistep spatiotemporal forecasting (MSTF) network is developed through incorporating the graph convolutional network (GCN) and the long short-term memory (LSTM) network within a sequence-to-sequence (seq2seq) framework. The MSTF method can not only extract spatial and temporal information from the input data but also make multistep-ahead and continuous predictions. An MSTF-based harmful algal bloom (HAB) forecasting model is then formulated to predict the chlorophyll-a (Chl-a) concentration of the Dianchi Lake (China). The integrated gradients (IG) method is employed to interpret the trained MSTF model and quantify the attribution of each input dimension to the Chl-a prediction. Results indicate that (i) the coefficient of determination (R 2) of the MSTF model in 24-h-ahead Chl-a prediction reaches 0.926, 28.4% higher than that of the traditional LSTM model; (ii) the ammonia nitrogen (12.3%), the total phosphorus (10.2%), the total nitrogen (9.9%), and the temperature (8.6%) are significant variables for Chl-a prediction; (iii) the spatial information from neighbor lake and river stations plays an important role in the HAB forecasting, with an average contribution of 35.0%; (iv) the proposed MSTF model is also skillful in the 72-h-ahead Chl-a prediction. Results presented highlight the importance of considering both spatial and temporal dependency of monitoring data in HAB forecasting and mechanism interpreting.

show abstract

“…ANN may become trapped in a local minimum on large datasets [56]. XGBT is suitable for processing structured feature data and unstructured data, which is not a good processing ability for unstructured data [57]. However, machine learning models are black box models.…”

Section: The Differences and Shortcomings Of The Machine Learning Modelsmentioning

confidence: 99%

Performance Assessment of Four Data-Driven Machine Learning Models: A Case to Generate Sentinel-2 Albedo at 10 Meters

et al. 2023

View full text Add to dashboard Cite

High-resolution albedo has the advantage of a higher spatial scale from tens to hundreds of meters, which can fill the gaps of albedo applications from the global scale to the regional scale and can solve problems related to land use change and ecosystems. The Sentinel-2 satellite provides high-resolution observations in the visible-to-NIR bands, giving possibilities to generate a high-resolution surface albedo at 10 m. This study attempted to evaluate the performance of the four data-driven machine learning algorithms (i.e., random forest (RF), artificial neural network (ANN), k-nearest neighbor (KNN), and XGBoost (XGBT)) for the generation of a Sentinel-2 albedo over flat and rugged terrain. First, we used the RossThick-LiSparseR model and the 3D discrete anisotropic radiative transfer (DART) model to build the narrowband surface reflectance and broadband surface albedo, which acted as the training and testing datasets over flat and rugged terrain. Second, we used the training and testing datasets to drive the four machine learning models, and evaluated the performance of these machine learning models for the generation of Sentinel-2 albedo. Finally, we used the four machine learning models to generate a Sentinel-2 albedo and compared them with in situ albedos to show the models’ application potentials. The results show that these machine learning models have great performance in estimating Sentinel-2 albedos at a 10 m spatial scale. The comparison with in situ albedos shows that the random forest model outperformed the others in estimating a high-resolution surface albedo based on Sentinel-2 datasets over the flat and rugged terrain, with an RMSE smaller than 0.0308 and R2 larger than 0.9472.

show abstract

Missing Features Reconstruction and Its Impact on Classification Accuracy

Cited by 5 publications

References 19 publications

Imputation of missing microclimate data of coffee-pine agroforestry with machine learning

Imputation of missing microclimate data of coffee-pine agroforestry with machine learning

Mining Spatiotemporal Information for Harmful Algal Bloom Forecasting and Mechanism Interpreting

Performance Assessment of Four Data-Driven Machine Learning Models: A Case to Generate Sentinel-2 Albedo at 10 Meters

Contact Info

Product

Resources

About