Accurate crop yield forecasting is essential in the food industry’s decision-making process, where vegetation condition index (VCI) and thermal condition index (TCI) coupled with machine learning (ML) algorithms play crucial roles. The drawback, however, is that a one-fits-all prediction model is often employed over an entire region without considering subregional VCI and TCI’s spatial variability resulting from environmental and climatic factors. Furthermore, when using nonlinear ML, redundant VCI/TCI data present additional challenges that adversely affect the models’ output. This study proposes a framework that (i) employs higher-order spatial independent component analysis (sICA), and (ii), exploits a combination of the principal component analysis (PCA) and ML (i.e., PCA-ML combination) to deal with the two challenges in order to enhance crop yield prediction accuracy. The proposed framework consolidates common VCI/TCI spatial variability into their respective subregions, using Vietnam as an example. Compared to the one-fits-all approach, subregional rice yield forecasting models over Vietnam improved by an average level of 20% up to 60%. PCA-ML combination outperformed ML-only by an average of 18.5% up to 45%. The framework generates rice yield predictions 1 to 2 months ahead of the harvest with an average of 5% error, displaying its reliability.
Machine learning (ML) has been widely used worldwide to develop crop yield forecasting models. However, it is still challenging to identify the most critical features from a dataset. Although either feature selection (FS) or feature extraction (FX) techniques have been employed, no research compares their performances and, more importantly, the benefits of combining both methods. Therefore, this paper proposes a framework that uses non-feature reduction (All-F) as a baseline to investigate the performance of FS, FX, and a combination of both (FSX). The case study employs the vegetation condition index (VCI)/temperature condition index (TCI) to develop 21 rice yield forecasting models for eight sub-regions in Vietnam based on ML methods, namely linear, support vector machine (SVM), decision tree (Tree), artificial neural network (ANN), and Ensemble. The results reveal that FSX takes full advantage of the FS and FX, leading FSX-based models to perform the best in 18 out of 21 models, while 2 (1) for FS-based (FX-based) models. These FXS-, FS-, and FX-based models improve All-F-based models at an average level of 21% and up to 60% in terms of RMSE. Furthermore, 21 of the best models are developed based on Ensemble (13 models), Tree (6 models), linear (1 model), and ANN (1 model). These findings highlight the significant role of FS, FX, and specially FSX coupled with a wide range of ML algorithms (especially Ensemble) for enhancing the accuracy of predicting crop yield.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.