Inspired by natural language processing techniques, we here introduce Mol2vec, which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Like the Word2vec models, where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that point in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing the vectors of the individual substructures and, for instance, be fed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pretrained once, yields dense vector representations, and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as a reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment-independent and thus can also be easily used for proteins with low sequence similarities.
BackgroundAnnotations of the phylogenetic tree of the human kinome is an intuitive way to visualize compound profiling data, structural features of kinases or functional relationships within this important class of proteins. The increasing volume and complexity of kinase-related data underlines the need for a tool that enables complex queries pertaining to kinase disease involvement and potential therapeutic uses of kinase inhibitors.ResultsHere, we present KinMap, a user-friendly online tool that facilitates the interactive navigation through kinase knowledge by linking biochemical, structural, and disease association data to the human kinome tree. To this end, preprocessed data from freely-available sources, such as ChEMBL, the Protein Data Bank, and the Center for Therapeutic Target Validation platform are integrated into KinMap and can easily be complemented by proprietary data. The value of KinMap will be exemplarily demonstrated for uncovering new therapeutic indications of known kinase inhibitors and for prioritizing kinases for drug development efforts.ConclusionKinMap represents a new generation of kinome tree viewers which facilitates interactive exploration of the human kinome. KinMap enables generation of high-quality annotated images of the human kinome tree as well as exchange of kinome-related data in scientific communications. Furthermore, KinMap supports multiple input and output formats and recognizes alternative kinase names and links them to a unified naming scheme, which makes it a useful tool across different disciplines and applications. A web-service of KinMap is freely available at http://www.kinhub.org/kinmap/.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1433-7) contains supplementary material, which is available to authorized users.
Graphical abstractHighlights► Molecular modelling of subtilisin-like protease 1 (SUB1) of three human malaria pathogens shows similarity in active site. ► Experimental examination of Plasmodium falciparum (Pf)SUB1 demonstrates unusual features of the active site. ► Recombinant expression of active Plasmodium vivax (Pv)SUB1, Plasmodium knowlesi (Pk)SUB1 and Plasmodium berghei (Pb)SUB1. ► Evidence for co-evolution of SUB1 orthologues and substrates following speciation. ► Production of substrate-based inhibitors with broad activity against SUB1 from three major human malarial pathogens.
Kinome-wide screening would have the advantage of providing structure-activity relationships against hundreds of targets simultaneously. Here, we report the generation of ligand-based activity prediction models for over 280 kinases by employing Machine Learning methods on an extensive data set of proprietary bioactivity data combined with open data. High quality (AUC > 0.7) was achieved for ∼200 kinases by (1) combining open with proprietary data, (2) choosing Random Forest over alternative tested Machine Learning methods, and (3) balancing the training data sets. Tests on left-out and external data indicate a high value for virtual screening projects. Importantly, the derived models are evenly distributed across the kinome tree, allowing reliable profiling prediction for all kinase branches. The prediction quality was further improved by employing experimental bioactivity fingerprints of a small kinase subset. Overall, the generated models can support various hit identification tasks, including virtual screening, compound repurposing, and the detection of potential off-targets.
RNA requires conformational dynamics to undergo its diverse functional roles. Here, a new topological network representation of RNA structures is presented that allows analyzing RNA flexibility/rigidity based on constraint counting. The method extends the FIRST approach, which identifies flexible and rigid regions in atomic detail in a single, static, three-dimensional molecular framework. Initially, the network rigidity of a canonical A-form RNA is analyzed by counting on constraints of network elements of increasing size. These considerations demonstrate that it is the inclusion of hydrophobic contacts into the RNA topological network that is crucial for an accurate flexibility prediction. The counting also explains why a protein-based parameterization results in overly rigid RNA structures. The new network representation is then validated on a tRNA(ASP) structure and all NMR-derived ensembles of RNA structures currently available in the Protein Data Bank (with chain length >/=40). The flexibility predictions demonstrate good agreement with experimental mobility data, and the results are superior compared to predictions based on two previously used network representations. Encouragingly, this holds for flexibility predictions as well as mobility predictions obtained by constrained geometric simulations on these networks. Potential applications of the approach to analyzing the flexibility of DNA and RNA/protein complexes are discussed.
We report all-atom molecular dynamics and replica exchange molecular dynamics simulations on the unbound human immunodeficiency virus type-1 (HIV-1) transactivation responsive region (TAR) RNA structure and three TAR RNA structures in bound conformations of, in total, ∼250 ns length. We compare the extent of observed conformational sampling with that of the conceptually simpler and computationally much cheaper constrained geometrical simulation approach framework rigidity optimized dynamic algorithm (FRODA). Atomic fluctuations obtained by replica-exchange molecular dynamics (REMD) simulations agree quantitatively with those obtained by molecular dynamics (MD) and FRODA simulations for the unbound TAR structure. Regarding the stereochemical quality of the generated conformations, backbone torsion angles and puckering modes of the sugar-phosphate backbone were reproduced equally well by MD and REMD simulations, but further improvement is needed in the case of FRODA simulations. Essential dynamics analysis reveals that all three simulation approaches show a tendency to sample bound conformations when starting from the unbound TAR structure, with MD and REMD simulations being superior with respect to FRODA. These results are consistent with the experimental view that bound TAR RNA conformations are transiently sampled in the free ensemble, following a conformation selection model. The simulation-generated TAR RNA conformations have been successfully used as receptor structures for docking. This finding has important implications for RNA-ligand docking in that docking into an ensemble of simulation-generated RNA structures is shown to be a valuable means to cope with large apo-to-holo conformational transitions of the receptor structure.
Peptidic α-ketoamides have been developed as inhibitors of the malarial protease PfSUB1. The design of inhibitors was based on the best known endogenous PfSUB1 substrate sequence, leading to compounds with low micromolar to submicromolar inhibitory activity. SAR studies were performed indicating the requirement of an aspartate mimicking the P1' substituent and optimal P1-P4 length of the non-prime part. The importance of each of the P1-P4 amino acid side chains was investigated, revealing crucial interactions and size limitations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.