Accurately predicting protein-ligand binding affinities is an important problem in computational chemistry since it can substantially accelerate drug discovery for virtual screening and lead optimization. We propose here a fast machine-learning approach for predicting binding affinities using state-of-the-art 3D-convolutional neural networks and compare this approach to other machine-learning and scoring methods using several diverse data sets. The results for the standard PDBbind (v.2016) core test-set are state-of-the-art with a Pearson's correlation coefficient of 0.82 and a RMSE of 1.27 in pK units between experimental and predicted affinity, but accuracy is still very sensitive to the specific protein used. K is made available via PlayMolecule.org for users to test easily their own protein-ligand complexes, with each prediction taking a fraction of a second. We believe that the speed, performance, and ease of use of K makes it already an attractive scoring function for modern computational chemistry pipelines.
Despite the many approaches to study differential splicing from RNA-seq, many challenges remain unsolved, including computing capacity and sequencing depth requirements. Here we present SUPPA2, a new method that addresses these challenges, and enables streamlined analysis across multiple conditions taking into account biological variability. Using experimental and simulated data, we show that SUPPA2 achieves higher accuracy compared to other methods, especially at low sequencing depth and short read length. We use SUPPA2 to identify novel Transformer2-regulated exons, novel microexons induced during differentiation of bipolar neurons, and novel intron retention events during erythroblast differentiation.Electronic supplementary materialThe online version of this article (10.1186/s13059-018-1417-1) contains supplementary material, which is available to authorized users.
In
this work, we propose a machine learning approach to generate
novel molecules starting from a seed compound, its three-dimensional
(3D) shape, and its pharmacophoric features. The pipeline draws inspiration
from generative models used in image analysis and represents a first
example of the de novo design of lead-like molecules guided by shape-based
features. A variational autoencoder is used to perturb the 3D representation
of a compound, followed by a system of convolutional and recurrent
neural networks that generate a sequence of SMILES tokens. The generative
design of novel scaffolds and functional groups can cover unexplored
regions of chemical space that still possess lead-like properties.
Chemical space is impractically large,
and conventional structure-based
virtual screening techniques cannot be used to simply search through
the entire space to discover effective bioactive molecules. To address
this shortcoming, we propose a generative adversarial network to generate,
rather than search, diverse three-dimensional ligand shapes complementary
to the pocket. Furthermore, we show that the generated molecule shapes
can be decoded using a shape-captioning network into a sequence of
SMILES enabling directly the structure-based de novo drug design.
We evaluate the quality of the method by both structure- (docking)
and ligand-based [quantitative structure–activity relationship
(QSAR)] virtual screening methods. For both evaluation approaches,
we observed enrichment compared to random sampling from initial chemical
space of ZINC drug-like compounds.
Graph neural networks are able to solve certain drug discovery tasks such as molecular property prediction and \textit{de novo} molecule generation. However, these models are considered 'black-box' and 'hard-to-debug'. This study aimed to improve modeling transparency for rational molecular design by applying the integrated gradients explainable artificial intelligence (XAI) approach for graph neural network models. Models were trained for predicting plasma protein binding, cardiac potassium channel inhibition, passive permeability, and cytochrome P450 inhibition. The proposed methodology highlighted molecular features and structural elements that are in agreement with known pharmacophore motifs, correctly identified property cliffs, and provided insights into unspecific ligand-target interactions. The developed XAI approach is fully open-sourced and can be used by practitioners to train new models on other clinically-relevant endpoints. File list (2) download file view on ChemRxiv jimenez2020color.pdf (5.85 MiB) download file view on ChemRxiv molgrad_series.csv (13.15 KiB)
Feature attribution techniques are popular choices within the explainable artificial intelligence toolbox, as they can help elucidate which parts of the provided inputs used by an underlying supervised-learning method are considered relevant for a specific prediction. In the context of molecular design, these approaches typically involve the coloring of molecular graphs, whose presentation to medicinal chemists can be useful for making a decision of which compounds to synthesize or prioritize. The consistency of the highlighted moieties alongside expert background knowledge is expected to contribute to the understanding of machine-learning models in drug design. Quantitative evaluation of such coloring approaches, however, has so far been limited to substructure identification tasks. We here present an approach that is based on maximum common substructure algorithms applied to experimentally-determined activity cliffs. Using the proposed benchmark, we found that molecule coloring approaches in conjunction with classical machine-learning models tend to outperform more modern, deeplearning-based alternatives. However, none of the tested feature attribution methods sufficiently and consistently generalized when confronted with unseen examples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.