Near-infrared (NIR) spectroscopy is a promising technique
for field
identification of substandard and falsified drugs because it is portable,
rapid, nondestructive, and can differentiate many formulated pharmaceutical
products. Portable NIR spectrometers rely heavily on chemometric analyses
based on libraries of NIR spectra from authentic pharmaceutical samples.
However, it is difficult to build comprehensive product libraries
in many low- and middle-income countries due to the large numbers
of manufacturers who supply these markets, frequent unreported changes
in materials sourcing and product formulation by the manufacturers,
and general lack of cooperation in providing authentic samples. In
this work, we show that a simple library of lab-formulated binary
mixtures of an active pharmaceutical ingredient (API) with two diluents
gave good performance on field screening tasks, such as discriminating
substandard and falsified formulations of the API. Six data analysis
models, including principal component analysis and support-vector
machine classification and regression methods and convolutional neural
networks, were trained on binary mixtures of acetaminophen with either
lactose or ascorbic acid. While the models all performed strongly
in cross-validation (on formulations similar to their training set),
they individually showed poor robustness for formulations outside
the training set. However, a predictive algorithm based on the six
models, trained only on binary samples, accurately predicts whether
the correct amount of acetaminophen is present in ternary mixtures,
genuine acetaminophen formulations, adulterated acetaminophen formulations,
and falsified formulations containing substitute APIs. This data analytics
approach may extend the utility of NIR spectrometers for analysis
of pharmaceuticals in low-resource settings.