Investigating the need for preprocessing of near-infrared spectroscopic data as a function of sample size

Schoot, Mark; Kapper, C.; Kollenburg, Geert H. van; Postma, G.J.; Kessel, G. van; Buydens, L.M.C.; Jansen, Jeroen J.

doi:10.1016/j.chemolab.2020.104105

Cited by 50 publications

(31 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Given that the sample size steps were taken larger, those results still confirm the theory based on the n/d given by Vapnik [20]. Complementary, in the work by Shoot [15], an absolute threshold of 100 samples was detected to be optimal for calibration models involving some food products. The optimal complexity of the models in the mentioned study was chosen as the location of the minimum RMSECV, which can correspond to RMSECV curves of long tails as it was the case of the milk data for the prediction of lactose content.…”

Section: About the Sample Sizesupporting

confidence: 74%

“…This means that a different sample size n will be required for an easy-to-predict dominant chemical constituent than for a minor constituent that is harder to predict. The same idea was commented by Schoot about the sample size for calibration models of different products [15], although no further analysis was provided to explain it. This insight was revealed by the analysis of the ratio n/d and the optimal sample sizes obtained in each case study.…”

Section: About the Sample Sizementioning

confidence: 97%

“…Although this sample size was suggested, no deeper analysis on the generalization of it was provided. Another recent study related to the study on the sample size suggested a size threshold of 100 samples based on the evidence found with calibration models for grain, dairy, petfood and compound food products [15]. In this latter study, the focus was mainly on the effect of the sample size for the effectiveness of preprocessing methods.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Cost-efficient unsupervised sample selection for multivariate calibration

Diaz

Ketelaere

Aernouts

et al. 2021

Chemometrics and Intelligent Laboratory Systems

View full text Add to dashboard Cite

Section: About the Sample Sizesupporting

confidence: 74%

Section: About the Sample Sizementioning

confidence: 97%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Cost-efficient unsupervised sample selection for multivariate calibration

Diaz

Ketelaere

Aernouts

et al. 2021

Chemometrics and Intelligent Laboratory Systems

View full text Add to dashboard Cite

“…Prior to further modeling, several methods were selected to improve the original spectra. The spectra preprocessing stage was an essential aspect of multivariate calibration, majorly aimed at eliminating irrelevant background signal or distortion in the original spectra, in a bid to enhance predictive accuracy or data interpretation, thus, maximizing the correlation to the desired quality parameters ( Schoot et al., 2020 ; Singpoonga et al., 2020 ) Meanwhile, PLSR was used to correlate the independent variables (absorbance spectra) with the dependent variables, the SSC and water content of intact zucchini, bitter gourd, ridge gourd, melon, chayote, cucumber (desired quality parameters). A single model was adopted for to evaluate the SSC of all samples, while another calibration model was employed to estimate the water content.…”

Section: Resultsmentioning

confidence: 99%

Multi-product calibration model for soluble solids and water content quantification in Cucurbitaceae family, using visible/near-infrared spectroscopy

Kusumiyati

Hadiwijaya

Putri

et al. 2021

Heliyon

View full text Add to dashboard Cite

Latest studies on Vis/NIR research mostly focused on particular products. Developing a model for a specific product is costly and laborious. This study utilized visible/near-infrared (Vis/NIR) spectroscopy to evaluate the quality attributes of six products of the Cucurbitaceae family, with a single estimation model, rather than individually. The study made use of six intact products, zucchini, bitter gourd, ridge gourd, melon, chayote, and cucumber. Subsequently, the multi-product models for soluble solids content (SSC) and water content were created using partial least squares regression (PLSR) method. The PLSR modeling produced satisfactory results, the coefficient of determination in calibration set (R 2 c) was discovered to be 0.95 and 0.92, while the root mean squares error of calibration (RMSEC) was found to be 0.41 and 0.61, for SSC and water content, respectively. These models were able to accurately predict the unknown samples with coefficient of determination in prediction set (R 2 p) of 0.96 and 0.92, as well as root mean squares error of prediction (RMSEP) of 0.32 and 0.58, while the ratio of prediction to deviation (RPD) was found to be 5.68 and 3.69 for SSC and water content, respectively. This shows Vis/NIR spectroscopy was able to quantify the SSC and water content of six products of Cucurbitaceae family, using a single model.

show abstract

“…[23,26]), and smoothing (Savitzky-Golay (SG) smoothing, etc. [27,28]). The above pre-processing methods effectively remove the effects of the instrument background or drift on the signal, eliminate the effects of scattering due to the inhomogeneous distribution of particles and different particle sizes on the spectrum, remove or reduce noise, and improve the signal-tonoise ratio.…”

Section: Introductionmentioning

confidence: 99%

Salinity Monitoring at Saline Sites with Visible–Near-Infrared Spectral Data

Liu

Bao

et al. 2021

Minerals

View full text Add to dashboard Cite

To address the global phenomenon of the salinisation of large land areas, a quantitative inversion model of the salinity of saline soils and soil visible–near-infrared (NIR) spectral data was developed by considering saline soils in Zhenlai County, Jilin Province, China as the research object. The original spectral data were first subjected to Savitzky–Golay (SG) smoothing, multiplicative scattering correction (MSC) pre-processing, and a combined transformation technique. The pre-processed spectral data were then analysed to construct the difference index (DI), ratio index (RI), and normalised difference index (NDI), and the Spearman rank correlation coefficient (r) between these three spectral indices and the salt content in the samples was calculated, while a combined spectral index (r > 0.8) was eventually selected as a sensitive spectral index. Finally, a quantitative inversion model for the salinity of saline soils was developed, and the model’s accuracy was evaluated based on partial least squares regression (PLSR), the random forest (RF) algorithm, and the radial basis function (RBF) neural network algorithm. The results indicated that the inversion of soil salt content using the selected combination of spectral indices based on the RBF neural network algorithm was the most effective, with the prediction model yielding an R2 value of 0.950, a root mean square error (RMSE) of 1.014, and a relative percentage deviation (RPD) of 4.479, which suggested a good prediction effect.

show abstract

Investigating the need for preprocessing of near-infrared spectroscopic data as a function of sample size

Cited by 50 publications

References 17 publications

Cost-efficient unsupervised sample selection for multivariate calibration

Cost-efficient unsupervised sample selection for multivariate calibration

Multi-product calibration model for soluble solids and water content quantification in Cucurbitaceae family, using visible/near-infrared spectroscopy

Salinity Monitoring at Saline Sites with Visible–Near-Infrared Spectral Data

Contact Info

Product

Resources

About