Tuberculosis (TB) is a major global
health challenge, with approximately
1.4 million deaths per year. There is still a need to develop novel
treatments for patients infected with Mycobacterium
tuberculosis (Mtb). There have been
many large-scale phenotypic screens that have led to the identification
of thousands of new compounds. Yet, there is very limited investment
in TB drug discovery which points to the need for new methods to increase
the efficiency of drug discovery against Mtb. We
have used machine learning approaches to learn from the public Mtb data, resulting in many data sets and models with robust
enrichment and hit rates leading to the discovery of new active compounds.
Recently, we have curated predominantly small-molecule Mtb data and developed new machine learning classification models with
18 886 molecules at different activity cutoffs. We now describe
the further validation of these Bayesian models using a library of
over 1000 molecules synthesized as part of EU-funded New Medicines
for TB and More Medicines for TB programs. We highlight molecular
features which are enriched in these active compounds. In addition,
we provide new regression and classification models that can be used
for scoring compound libraries or used to design new molecules. We
have also visualized these molecules in the context of known molecular
targets and identified clusters in chemical property space, which
may aid in future target identification efforts. Finally, we are also
making these data sets publicly available, representing a significant
increase to the available Mtb inhibition data in
the public domain.