Mass spectrometry
is commonly used in the identification of species
present in microbial samples, but the high similarity in the peptide
composition between strains of a single species has made analysis
at the subspecies level challenging. Prior research in this area has
employed methods such as Principal Component Analysis (PCA), the k-Nearest
Neighbors’ (kNN) algorithm, and Pearson correlation. Previously,
1D cross-correlation of mass spectra has been shown to be useful in
the classification of small molecule compounds as well as in the identification
of peptide sequences via the SEQUEST algorithm and its variants. While
direct application of cross-correlation to mass spectral data has
been shown to aid in the identification of many other types of compounds,
this type of analysis has not been demonstrated in the literature
for the purpose of LC-MS based identification of microbial strains.
A method of identifying microbial strains is presented here that applies
the principle of 2D cross-correlation to LC-MS data. For a set of N = 30 yeast isolate samples representing 5 yeast strains
(K-97, S-33, T-58, US-05, WB-06), high-resolution LC-MS-Orbitrap data
were collected. Reference spectra were then generated for each strain
from the combined data of each sample of that strain. Sample strains
were then predicted by computing the 2D cross-correlation of each
sample against the reference spectra, followed by application of correction
factors measuring the asymmetry of the 2D correlation functions.