Selecting and translating in vitro leads for a
disease into molecules with in vivo activity in an
animal model of the disease is a challenge that takes considerable
time and money. As an example, recent years have seen whole-cell phenotypic
screens of millions of compounds yielding over 1500 inhibitors of Mycobacterium tuberculosis (Mtb).
These must be prioritized for testing in the mouse in vivo assay for Mtb infection, a validated model utilized
to select compounds for further testing. We demonstrate learning from in vivo active and inactive compounds using machine learning
classification models (Bayesian, support vector machines, and recursive
partitioning) consisting of 773 compounds. The Bayesian model predicted
8 out of 11 additional in vivo actives not included
in the model as an external test set. Curation of 70 years of Mtb data can therefore provide statistically robust computational
models to focus resources on in vivo active small
molecule antituberculars. This highlights a cost-effective predictor
for in vivo testing elsewhere in other diseases.