The methodology and mathematical treatment of several
classic multivariate
methods for the analysis of spectroscopic data is demonstrated in
a straightforward way that can be used as a basis for teaching an
undergraduate introductory course on chemometric analysis. The multivariate
techniques of classical least-squares (CLS), principal component regression
(PCR), and partial least-squares (PLS), as well as the univariate
Beer’s law method have been described and compared, building
students’ understanding by starting with the univariate method
and progressing step by step into the multivariate methods. Equations
for the production of regression vectors from training set spectral
data are described and their use demonstrated for the prediction of
constituent concentrations on a separate validation set of spectra.
Extreme care is taken to ensure consistency in variable formatting
of data matrices. This provides a key foundation to understand how
spectral data are manipulated using these different mathematical approaches
for building quantitative regression models. Each method is applied
to a real-world data set, and the results are discussed to show students
the types of information that can be gleaned from each method. A training
set comprising 20 infrared absorbance spectra containing 3 constituents
(benzene, polystyrene, and gasoline) of known composition are used
to demonstrate the matrix operations for each regression method. A
separate set of 12 real-world napalm samples (containing benzene,
polystyrene, and gasoline) are used as a validation set to demonstrate
the ability to utilize the regression models on an unknown data set.
A toolbox (PNNL Chemometric Toolbox) written in MATLAB language is
supplied in the Supporting Information and can be used as a companion
for understanding the development and deployment of the chemometric
algorithms described in this paper. The data sets of the infrared
spectra are also supplied, allowing users to build and inspect the
chemometric models on their own. Finally, the Toolbox includes scripts
to assist users in loading their own data sets into MATLAB and performing
CLS, PCR, and PLS on their data.