ISIDA Property-Labelled Fragment Descriptors (IPLF) were introduced as a general framework to numerically encode molecular structures in chemoinformatics, as counts of specific subgraphs in which atom vertices are coloured with respect to some local property/feature. Combining various colouring strategies of the molecular graph - notably pH-dependent pharmacophore and electrostatic potential-based flagging - with several fragmentation schemes, the different subtypes of IPLFs may range from classical atom pair and sequence counts, to monitoring population levels of branched fragments or feature multiplets. The pH-dependent feature flagging, pursued at the level of each significantly populated microspecies involved in the proteolytic equilibrium, may furthermore add some competitive advantage over classical descriptors, even when the chosen fragmentation scheme is one of the state-of-the-art pattern extraction procedures (feature sequence or pair counts, etc.) in chemoinformatics. The implemented fragmentation schemes support counting (1) linear feature sequences, (2) feature pairs, (3) circular feature fragments a.k.a. "augmented atoms" or (4) feature trees. Fuzzy rendering - optionally allowing nonterminal fragment atoms to be counted as wildcards, ignoring their specific colours/features - ensures for a seamless transition between the "strict" counts (sequences or circular fragments) and the "fuzzy" multiplet counts (pairs or trees). Also, bond information may be represented or ignored, thus leaving the user a vast choice in terms of the level of resolution at which chemical information should be extracted into the descriptors. Selected IPLF subsets were - tree descriptors, in particular - successfully tested in both neighbourhood behaviour and QSAR modelling challenges, with very promising results. They showed excellent results in similarity-based virtual screening for analogue protease inhibitors, and generated highly predictive octanol-water partition coefficient and hERG channel inhibition models.
The deuteration of proteins and selective labeling of side chain methyl groups has greatly enhanced the molecular weight range of proteins and protein complexes which can be studied using solution NMR spectroscopy. Protocols for the selective labeling of all six methyl group containing amino acids individually are available, however to date, only a maximum of five amino acids have been labeled simultaneously. Here, we describe a new methodology for the simultaneous, selective labeling of all six methyl containing amino acids using the 115 kDa homohexameric enzyme CoaD from E. coli as a model system. The utility of the labeling protocol is demonstrated by efficiently and unambiguously assigning all methyl groups in the enzymatic active site using a single 4D (13)C-resolved HMQC-NOESY-HMQC experiment, in conjunction with a crystal structure. Furthermore, the six fold labeled protein was employed to characterize the interaction between the substrate analogue (R)-pantetheine and CoaD by chemical shift perturbations, demonstrating the benefit of the increased probe density.
Here, we introduce new ISIDA fragment descriptors able to describe "local" properties related to selected atoms or molecular fragments. These descriptors have been applied for QSPR modelling of the H-bond basicity scale pKBHX , measured by the 1 : 1 complexation constant of a series of organic acceptors (H-bond bases) with 4-fluorophenol as the reference H-bond donor in CCl4 at 298 K. Unlike previous QSPR studies of H-bond complexation, the models based on these new descriptors are able to predict the H-bond basicity of different acceptor centres on the same polyfunctional molecule. QSPR models were obtained using support vector machine and ensemble multiple linear regression methods on a set of 537 organic compounds including 5 bifunctional molecules. They were validated with cross-validation procedures and with two external test sets. The best model displays good predictive performance on a large test set of 451 mono- and bifunctional molecules: a root-mean squared error RMSE=0.26 and a determination coefficient R(2) =0.91. It is implemented on our website (http://infochim.u-strasbg.fr/webserv/VSEngine.html) together with the estimation of its applicability domain and an automatic detection of potential H-bond acceptors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.