The conformational states adopted by a polymer chain in water are a result of a delicate balance between intra-molecular and water-mediated interactions. Using an explicit representation of the solvent is, however, computationally expensive and it is often necessary to turn to implicit representations. We present a systematic derivation of implicit models of water and study the effect of simplifying the representation of the solvent on the conformations of hydrophobic homopolymers of varying length. Starting from the explicit coarse-grained single site mW water model, we develop an implicit solvent model that reproduces the free energy of the contact pair between two hydrophobic monomers, an implicit solvent model that captures the free energy of contact pair minima, desolvation barrier, and solvent-separated minima, and finally, we consider vacuum simulations. We generate potentials of mean force for polymers of various lengths in explicit water, the implicit solvents and vacuum, using umbrella sampling and replica exchange molecular dynamics simulations. Surprisingly, vacuum simulations outperform the implicit solvent simulations, with the implicit model involving a desolvation barrier producing spurious extended polymer conformations. © 2017 Wiley Periodicals, Inc.
Variational autoencoders are artificial neural networks with the capability to reduce highly dimensional sets of data to smaller dimensional, latent representations. In this work, these models are applied to molecular dynamics simulations of the self-assembly of coarse-grained peptides to obtain a singled-valued order parameter for amyloid aggregation. This automatically learned order parameter is constructed by time-averaging the latent parametrizations of internal coordinate representations and compared to the nematic order parameter which is commonly used to study ordering of similar systems in literature. It is found that the latent space value provides more tailored insight into the aggregation mechanism’s details, correctly identifying fibril formation in instances where the nematic order parameter fails to do so. A means is provided by which the latent space value can be analyzed so that the major contributing internal coordinates are identified, allowing for a direct interpretation of the latent space order parameter in terms of the behavior of the system. The latent model is found to be an effective and convenient way of representing the data from the dynamic ensemble and provides a means of reducing the dimensionality of a system whose scale exceeds molecular systems so-far considered with similar tools. This bypasses a need for researcher speculation on what elements of a system best contribute to summarizing major transitions and suggests latent models are effective and insightful when applied to large systems with a diversity of complex behaviors.
The need for careful assembly, training, and validation of quantitative structure−activity/property models (QSAR/QSPR) is more significant than ever as data sets become larger and sophisticated machine learning tools become increasingly ubiquitous and accessible to the scientific community. Regulatory agencies such as the United States Environmental Protection Agency must carefully scrutinize each aspect of a resulting QSAR/QSPR model to determine its potential use in environmental exposure and hazard assessment. Herein, we revisit the goals of the Organisation for Economic Cooperation and Development (OECD) in our application and discuss the validation principles for structure−activity models. We apply these principles to a model for predicting water solubility of organic compounds derived using random forest regression, a common machine learning approach in the QSA/PR literature. Using public sources, we carefully assembled and curated a data set consisting of 10,200 unique chemical structures with associated water solubility measurements. This data set was then used as a focal narrative to methodically consider the OECD's QSA/PR principles and how they can be applied to random forests. Despite some expert, mechanistically informed supervision of descriptor selection to enhance model interpretability, we achieved a model of water solubility with comparable performance to previously published models (5-fold cross validated performance 0.81 R 2 and 0.98 RMSE). We hope this work will catalyze a necessary conversation around the importance of cautiously modernizing and explicitly leveraging OECD principles while pursuing state-of-the-art machine learning approaches to derive QSA/PR models suitable for regulatory consideration.
Despite the importance of amyloid formation in disease pathology, the understanding of the primary structure–activity relationship for amyloid-forming peptides remains elusive. Here we use a new neural-network based method of analysis: the classifying autoencoder (CAE). This machine learning technique uses specialized architecture of artificial neural networks to provide insight into typically opaque classification processes. The method proves to be robust to noisy and limited data sets, as well as being capable of disentangling relatively complicated rules over data sets. We demonstrate its capabilities by applying the technique to an experimental database (the Waltz database) and demonstrate the CAE’s capability to provide insight into a novel descriptor, dimeric isotropic deviationan experimental measure of the aggregation properties of the amino acids. We measure this value for all 20 of the common amino acids and find correlation between dimeric isotropic deviation and the failure to form amyloids when hydrophobic effects are not a primary driving force in amyloid formation. These applications show the value of the new method and provide a flexible and general framework to approach problems in biochemistry using artificial neural networks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.