Selecting optimal
combinations of preprocessing methods is a major
holdup for chemometric analysis. The analyst decides which method(s)
to apply to the data, frequently by highly subjective or inefficient
means, such as user experience or trial and error. Here, we present
a user-friendly method using optimal experimental designs for selecting
preprocessing transformations. We applied this strategy to optimize
partial least square regression (PLSR) analysis of Stokes Raman spectra
to quantify hydroxylammonium (0–0.5 M), nitric acid (0–1
M), and total nitrate (0–1.5 M) concentrations. The best PLSR
model chosen by a determinant (D)-optimal design comprising 26 samples
(i.e., combinations of preprocessing methods) was compared with PLSR
models built with no preprocessing, a user-selected preprocessing
method (i.e., trial and error), and a user-defined design strategy
(576 samples). The D-optimal selection strategy improved PLSR prediction
performance by more than 50% compared with the raw data and reduced
the number of combinations by more than 95.5%.