One of the grand
challenges in contemporary chemical biology is
the generation of a probe for every member of the human proteome.
Probe selection and optimization strategies typically rely on experimental
bioactivity data to determine the potency and selectivity of candidate
molecules. However, this approach is profoundly limited by the sparsity
of the known data, the annotation bias often found in the literature,
and the cost of physical screening. Recent advancements in predictive
pharmacology, such as the application of multitask and transfer learning,
as well as the use of biologically motivated, structure-agnostic features
to characterize molecules, should serve to mitigate these issues.
Computational modeling likely offers the only cost-effective approach
to substantially increasing the bioactivity annotation density both
on the local and global scale and thus, we argue, will need to make
a substantial contribution if the ambitious goals of probing the human
proteome are to be realized in the foreseeable future.