BackgroundThe use of visible-near infrared (vis-NIR) spectroscopy for rapid soil characterisation has gained a lot of interest in recent times. Soil spectra absorbance from the visible-infrared range can be calibrated using regression models to predict a set of soil properties. The accuracy of these regression models relies heavily on the calibration set. The optimum sample size and the overall sample representativeness of the dataset could further improve the model performance. However, there is no guideline on which sampling method should be used under different size of datasets.MethodsHere, we show different sampling algorithms performed differently under different data size and different regression models (Cubist regression tree and Partial Least Square Regression (PLSR)). We analysed the effect of three sampling algorithms: Kennard-Stone (KS), conditioned Latin Hypercube Sampling (cLHS) and k-means clustering (KM) against random sampling on the prediction of up to five different soil properties (sand, clay, carbon content, cation exchange capacity and pH) on three datasets. These datasets have different coverages: a European continental dataset (LUCAS, n = 5,639), a regional dataset from Australia (Geeves, n = 379), and a local dataset from New South Wales, Australia (Hillston, n = 384). Calibration sample sizes ranging from 50 to 3,000 were derived and tested for the continental dataset; and from 50 to 200 samples for the regional and local datasets.ResultsOverall, the PLSR gives a better prediction in comparison to the Cubist model for the prediction of various soil properties. It is also less prone to the choice of sampling algorithm. The KM algorithm is more representative in the larger dataset up to a certain calibration sample size. The KS algorithm appears to be more efficient (as compared to random sampling) in small datasets; however, the prediction performance varied a lot between soil properties. The cLHS sampling algorithm is the most robust sampling method for multiple soil properties regardless of the sample size.DiscussionOur results suggested that the optimum calibration sample size relied on how much generalization the model had to create. The use of the sampling algorithm is beneficial for larger datasets than smaller datasets where only small improvements can be made. KM is suitable for large datasets, KS is efficient in small datasets but results can be variable, while cLHS is less affected by sample size.
Subsoil alkalinity is a common issue in the alluvial cotton-growing valleys of northern New South Wales (NSW), Australia. Soil alkalinity can cause nutrient deficiencies and toxic effects, and inhibit rooting depth, which can have a detrimental impact on crop production. The depth at which a soil constraint is reached is important information for land managers, but it is difficult to measure or predict spatially. This study predicted the depth in which a pH (H2O) constraint (>9) was reached to a 1-cm vertical resolution to a 100-cm depth, on a 1070-hectare dryland cropping farm. Equal-area quadratic smoothing splines were used to resample vertical soil profile data, and a random forest (RF) model was used to produce the depth-to-soil pH constraint map. The RF model was accurate, with a Lin’s Concordance Correlation Coefficient (LCCC) of 0.63–0.66, and a Root Mean Square Error (RMSE) of 0.47–0.51 when testing with leave-one-site-out cross-validation. Approximately 77% of the farm was found to be constrained by a strongly alkaline pH greater than 9 (H2O) somewhere within the top 100 cm of the soil profile. The relationship between the predicted depth-to-soil pH constraint map and cotton and grain (wheat, canola, and chickpea) yield monitor data was analyzed for individual fields. Results showed that yield increased when a soil pH constraint was deeper in the profile, with a good relationship for wheat, canola, and chickpea, and a weaker relationship for cotton. The overall results from this study suggest that the modelling approach is valuable in identifying the depth-to-soil pH constraint, and could be adopted for other important subsoil constraints, such as sodicity. The outputs are also a promising opportunity to understand crop yield variability, which could lead to improvements in management practices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.