Proteins are the molecular machines of life. The multitude of possible conformations that proteins can adopt determines their free-energy landscapes. However, the inherently high dimensionality of a protein free-energy landscape poses a challenge to deciphering how proteins perform their functions. For this reason, dimensionality reduction is an active field of research for molecular biologists. The uniform manifold approximation and projection (UMAP) is a dimensionality reduction method based on a fuzzy topological analysis of data. In the present study, the performance of UMAP is compared with that of other popular dimensionality reduction methods such as t-distributed stochastic neighbor embedding (t-SNE), principal component analysis (PCA), and time-structure independent components analysis (tICA) in the context of analyzing molecular dynamics simulations of the circadian clock protein VIVID. A good dimensionality reduction method should accurately represent the data structure on the projected components. The comparison of the raw high-dimensional data with the projections obtained using different dimensionality reduction methods based on various metrics showed that UMAP has superior performance when compared with linear reduction methods (PCA and tICA) and has competitive performance and scalable computational cost.
Molecular dynamics (MD) simulations have been actively used in the study of protein structure and function. However, extensive sampling in the protein conformational space requires large computational resources and takes a prohibitive amount of time. In this study, we demonstrated that variational autoencoders (VAEs), a type of deep learning model, can be employed to explore the conformational space of a protein through MD simulations. VAEs are shown to be superior to autoencoders (AEs) through a benchmark study, with low deviation between the training and decoded conformations. Moreover, we show that the learned latent space in the VAE can be used to generate unsampled protein conformations. Additional simulations starting from these generated conformations accelerated the sampling process and explored hidden spaces in the conformational landscape.
Severe Acute Respiratory Syndrome Corona Virus 2 has altered life on a global scale. A concerted effort from research labs around the world resulted in the identification of potential pharmaceutical treatments for CoVID-19 using existing drugs, as well as the discovery of multiple vaccines. During an urgent crisis, rapidly identifying potential new treatments requires global and cross-discipline cooperation, together with an enhanced open-access research model to distribute new ideas and leads. Herein, we introduce an application of a deep neural network based drug screening method, validating it using a docking algorithm on approved drugs for drug repurposing efforts, and extending the screen to a large library of 750,000 compounds for de novo drug discovery effort. The results of large library screens are incorporated into an open-access web interface to allow researchers from diverse fields to target molecules of interest. Our combined approach allows for both the identification of existing drugs that may be able to be repurposed and de novo design of ACE2-regulatory compounds. Through these efforts we demonstrate the utility of a new machine learning algorithm for drug discovery, SSnet, that can function as a tool to triage large molecular libraries to identify classes of molecules with possible efficacy.
Computational prediction of Protein-Ligand Interaction (PLI) is an important step in the modern drug discovery pipeline as it mitigates the cost, time, and resources required to screen novel therapeutics. Deep Neural Networks (DNN) have recently shown excellent performance in PLI prediction. However, the performance is highly dependent on protein and ligand features utilized for the DNN model. Moreover, in current models, the deciphering of how protein features determine the underlying principles that govern PLI is not trivial. In this work, we developed a DNN framework named SSnet that utilizes secondary structure information of proteins extracted as the curvature and torsion of the protein backbone to predict PLI. We demonstrate the performance of SSnet by comparing against a variety of currently popular machine and non-Machine Learning (ML) models using various metrics. We visualize the intermediate layers of SSnet to show a potential latent space for proteins, in particular to extract structural elements in a protein that the model finds influential for ligand binding, which is one of the key features of SSnet. We observed in our study that SSnet learns information about locations in a protein where a ligand can bind, including binding sites, allosteric sites and cryptic sites, regardless of the conformation used. We further observed that SSnet is not biased to any specific molecular interaction and extracts the protein fold information critical for PLI prediction. Our work forms an important gateway to the general exploration of secondary structure-based Deep Learning (DL), which is not just confined to protein-ligand interactions, and as such will have a large impact on protein research, while being readily accessible for de novo drug designers as a standalone package.
The conformational-driven allosteric protein diatom Phaeodactylum tricornutum aureochrome 1a (PtAu1a) differs from other light-oxygen-voltage (LOV) proteins for its uncommon structural topology. The mechanism of signaling transduction in PtAu1a LOV domain (AuLOV) including flanking helices remains unclear because of this dissimilarity, which hinders the study of PtAu1a as an optogenetic tool. To clarify this mechanism, we employed a combination of tree-based machine learning models, Markov state models, machine learning based community analysis and transition path theory to quantitatively analyze the allosteric process. Our results are in good agreement with reported experimental findings and revealed a previously overlooked Cα helix and linkers as important in promoting the protein conformational change. This integrated approach can be considered as a general workflow and applied on other allosteric proteins to provide detailed information about their allosteric mechanisms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.