Near infrared spectroscopy relies heavily on the collection of an appropriate population of samples for calibration and the best mathematical procedure to obtain the most accurate calibration. The purpose of this study was to evaluate two algorithms (CENTER and SELECT) for defining the population and selecting samples for calibration. The selected samples were used to compare modified partial least squares regression (MPLSR) with modified stepwise regression (MSR) calibration method. The algorithms were developed to (i) establish the boundaries of a population of samples in terms of the standardized Mahalanobis distance (H) from the mean and (ii) select a small, structured set of samples for calibration using the standardized H distance between sample pairs. Two diverse populations of samples were used to test these approaches. Calibrations were performed using MPLSR and MSR. A standardized H distance of 3.0 from the mean was used as a boundary for excluding spectral outliers from a population, and a minimum standardized H distance between samples of 0.6 provided an adequate number of calibration samples for accurate predictions. Both regression methods provided acceptable validation statistics for crude protein, acid detergent fiber, and in vitro dry matter disappearance. The MPLSR calibration method gave an overall 18% improvement in standard error of performance (SEP) compared with the MSR calibration method.
The computer programs CENTER and SELECT have been presented as a way to establish population boundaries and choose samples for near infrared calibrations. This study was conducted to evaluate calibrations derived on samples chosen by CENTER and SELECT from broad groups of hay, haylage, corn (Zea mays L.), wheat (Triticnm aestivum L.), and barley (Hordeum vulgare L.) samples. Population boundaries were established with a maximum standardized H distance from the average spectrum of 3.0. Every fifth sample was reserved for equation validation. Calibration samples were selected with a minimum standardized H distance between samples of 0.6. Forage samples were found to have more diverse spectra and chemistry than grain samples. Average r2 values were smaller, numbers of eigenvectors were larger, and standard deviations of laboratory reference values were larger for forages than for grains. The standard error of performance (SEP) for all samples and SEP for samples chosen by SELECT with a limit of 0.6 were similar for four of five products. Calibrations were developed using five different math treatments with and without multiplicafive scatter correction (De‐trend). First derivative was the best math treatment for protein in all products. Second derivative was best for acid‐detergent fiber (ADF) in forage products, but no single math treatment was superior for ADF in grain products. De‐trend improved SEP in 28 of 50 calibrations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.