Coupling machine learning and crop modeling improves crop yield prediction in the US Corn Belt

Shahhosseini, Mohsen; Hu, Guiping; Huber, Isaiah; Archontoulis, Sotirios

doi:10.1038/s41598-020-80820-1

Cited by 224 publications

(130 citation statements)

References 76 publications

Supporting

Mentioning

100

Contrasting

Order By: Relevance

“…In a recent study, Shahhosseini et al [60] showed that adding physics-based crop model variables as input features to ML models can improve the performance of ML models by 29% on average in the US Corn Belt. They compared the performances of several ML models such as RF, linear regression (LR), least absolute shrinkage and selection operator (LASSO) regression, Light Gradient Boost (LightGBM), Extreme Gradient Boost (XGBoost), and also an ensemble of them to investigate their added value individually and in combination.…”

Section: Crop Yield Predictionmentioning

confidence: 99%

A Review of Machine Learning Applications in Land Surface Modeling

Pal

Sharma

2021

Earth

View full text Add to dashboard Cite

Machine learning (ML), as an artificial intelligence tool, has acquired significant progress in data-driven research in Earth sciences. Land Surface Models (LSMs) are important components of the climate models, which help to capture the water, energy, and momentum exchange between the land surface and the atmosphere, providing lower boundary conditions to the atmospheric models. The objectives of this review paper are to highlight the areas of improvement in land modeling using ML and discuss the crucial ML techniques in detail. Literature searches were conducted using the relevant key words to obtain an extensive list of articles. The bibliographic lists of these articles were also considered. To date, ML-based techniques have been able to upgrade the performance of LSMs and reduce uncertainties by improving evapotranspiration and heat fluxes estimation, parameter optimization, better crop yield prediction, and model benchmarking. Widely used ML techniques used for these purposes include Artificial Neural Networks and Random Forests. We conclude that further improvements in land modeling are possible in terms of high-resolution data preparation, parameter calibration, uncertainty reduction, efficient model performance, and data assimilation using ML. In addition to the traditional techniques, convolutional neural networks, long short-term memory, and other deep learning methods can be implemented.

show abstract

Section: Crop Yield Predictionmentioning

confidence: 99%

A Review of Machine Learning Applications in Land Surface Modeling

Pal

Sharma

2021

Earth

View full text Add to dashboard Cite

show abstract

“…AI4Water is built on top of Scikit-learn, CatBoost, XGBoost, and LightGBM libraries to build classical machine learning models. These models have been used in several hydrological studies (Ni et al, 2020;Huang et al, 2019;Shahhosseini et al, 2021). To build deep learning models using neural networks, AI4Water uses a popular deep learning platform called TensorFlow (Abadi et al, 2016).…”

Section: Workflow and Model Structurementioning

confidence: 99%

AI4Water v1.0: An open source python package for modeling hydrological time series using data-driven methods

Abbas

Boithias²,

Pachepsky

et al. 2021

Preprint

View full text Add to dashboard Cite

Abstract. Machine learning has shown great promise for simulating hydrological phenomena. However, the development of machine learning-based hydrological models requires advanced skills from diverse fields, such as programming and hydrological modeling. Additionally, data pre-processing and post-processing when training and testing machine learning models is a time-intensive process. In this study, we developed a python-based framework that simplifies the process of building and training machine learning-based hydrological models and automates the process of pre-processing of hydrological data and post-processing of model results. Pre-processing utilities assist in incorporating domain knowledge of hydrology in the machine learning model, such as the distribution of weather data into hydrologic response units (HRUs) based on different HRU discretization definitions. The post-processing utilities help in interpreting the model’s results from a hydrological point of view. This framework will help increase the application of machine learning-based modeling approaches in hydrological sciences.

show abstract

“…Some other studies have made additional advancement and created hybrid crop model-ML methodologies by using crop model outputs as inputs to a ML model ( Everingham et al, 2016 ; Feng et al, 2019 ). In a recent study, Shahhosseini et al (2021) designed a hybrid crop model-ML ensemble framework, in which a crop modeling framework (APSIM) was used to provide additional inputs to the yield prediction task (For more information about APSIM refer to https://www.apsim.info/ ). The results demonstrated that coupling APSIM and ML could improve ML performance up to 29% compared to ML alone.…”

Section: Introductionmentioning

confidence: 99%

“…Although there is always a tradeoff between the model complexity and its interpretability, the recent complex models could better capture all kinds of associations such as linear and nonlinear relationships between the variables associated with the crop yields, resulting in more accurate predictions and subsequently better helping decision makers ( Chlingaryan et al, 2018 ). These models span from models as simple as linear regression, k-nearest neighbor, and regression trees ( González Sánchez et al, 2014 ; Mupangwa et al, 2020 ), to more complex methods such as support vector machines ( Stas et al, 2016 ), homogenous ensemble models ( Vincenzi et al, 2011 ; Fukuda et al, 2013 ; Heremans et al, 2015 ; Jeong et al, 2016 ; Shahhosseini et al, 2019 ), heterogenous ensemble models ( Cai et al, 2017 ; Shahhosseini et al, 2020 , 2021 ), and deep neural networks ( Liu et al, 2001 ; Drummond et al, 2003 ; Jiang et al, 2004 , 2020 ; Pantazi et al, 2016 ; You et al, 2017 ; Crane-Droesch, 2018 ; Wang et al, 2018 ; Khaki and Wang, 2019 ; Kim et al, 2019 ; Yang et al, 2019 ; Khaki et al, 2020a , b ). Homogeneous ensemble models are the models created using same-type base learners, while the base learners in the heterogenous ensemble models are different.…”

Section: Introductionmentioning

confidence: 99%

“…Motivated by the high predictive performance of CNNs and ensemble models in ecology ( Cai et al, 2017 ; You et al, 2017 ; Yang et al, 2019 ; Khaki et al, 2020b ; Shahhosseini et al, 2020 , 2021 ), we propose a set of ensemble models created from multiple hybrid CNN-DNN base learners for predicting county-level corn yields across US Corn Belt states. Building upon successful studies in the literature ( Khaki et al, 2020b ; Shahhosseini et al, 2020 ), we designed a base architecture consisting of two one-dimensional CNNs and one fully connected network (FC) as the first layer networks, and another fully connected network that combined the outputs of the first-layer networks and made final predictions, as the second-layer network.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Corn Yield Prediction With Ensemble CNN-DNN

2021

Self Cite

View full text Add to dashboard Cite

We investigate the predictive performance of two novel CNN-DNN machine learning ensemble models in predicting county-level corn yields across the US Corn Belt (12 states). The developed data set is a combination of management, environment, and historical corn yields from 1980 to 2019. Two scenarios for ensemble creation are considered: homogenous and heterogenous ensembles. In homogenous ensembles, the base CNN-DNN models are all the same, but they are generated with a bagging procedure to ensure they exhibit a certain level of diversity. Heterogenous ensembles are created from different base CNN-DNN models which share the same architecture but have different hyperparameters. Three types of ensemble creation methods were used to create several ensembles for either of the scenarios: Basic Ensemble Method (BEM), Generalized Ensemble Method (GEM), and stacked generalized ensembles. Results indicated that both designed ensemble types (heterogenous and homogenous) outperform the ensembles created from five individual ML models (linear regression, LASSO, random forest, XGBoost, and LightGBM). Furthermore, by introducing improvements over the heterogenous ensembles, the homogenous ensembles provide the most accurate yield predictions across US Corn Belt states. This model could make 2019 yield predictions with a root mean square error of 866 kg/ha, equivalent to 8.5% relative root mean square and could successfully explain about 77% of the spatio-temporal variation in the corn grain yields. The significant predictive power of this model can be leveraged for designing a reliable tool for corn yield prediction which will in turn assist agronomic decision makers.

show abstract

Coupling machine learning and crop modeling improves crop yield prediction in the US Corn Belt

Cited by 224 publications

References 76 publications

A Review of Machine Learning Applications in Land Surface Modeling

A Review of Machine Learning Applications in Land Surface Modeling

AI4Water v1.0: An open source python package for modeling hydrological time series using data-driven methods

Corn Yield Prediction With Ensemble CNN-DNN

Contact Info

Product

Resources

About