The PharmMapper online tool is a web server for potential drug target identification by reversed pharmacophore matching the query compound against an in-house pharmacophore model database. The original version of PharmMapper includes more than 7000 target pharmacophores derived from complex crystal structures with corresponding protein target annotations. In this article, we present a new version of the PharmMapper web server, of which the backend pharmacophore database is six times larger than the earlier one, with a total of 23 236 proteins covering 16 159 druggable pharmacophore models and 51 431 ligandable pharmacophore models. The expanded target data cover 450 indications and 4800 molecular functions compared to 110 indications and 349 molecular functions in our last update. In addition, the new web server is united with the statistically meaningful ranking of the identified drug targets, which is achieved through the use of standard scores. It also features an improved user interface. The proposed web server is freely available at http://lilab.ecust.edu.cn/pharmmapper/.
CavityPlus is a web server that offers protein cavity detection and various functional analyses. Using protein three-dimensional structural information as the input, CavityPlus applies CAVITY to detect potential binding sites on the surface of a given protein structure and rank them based on ligandability and druggability scores. These potential binding sites can be further analysed using three submodules, CavPharmer, CorrSite, and CovCys. CavPharmer uses a receptor-based pharmacophore modelling program, Pocket, to automatically extract pharmacophore features within cavities. CorrSite identifies potential allosteric ligand-binding sites based on motion correlation analyses between cavities. CovCys automatically detects druggable cysteine residues, which is especially useful to identify novel binding sites for designing covalent allosteric ligands. Overall, CavityPlus provides an integrated platform for analysing comprehensive properties of protein binding cavities. Such analyses are useful for many aspects of drug design and discovery, including target selection and identification, virtual screening, de novo drug design, and allosteric and covalent-binding drug design. The CavityPlus web server is freely available at http://repharma.pku.edu.cn/cavityplus or http://www.pkumdl.cn/cavityplus.
General approaches for designing sequence-specific peptide-binding proteins would have wide utility in proteomics and synthetic biology. However, designing peptide-binding proteins is challenging, as most peptides do not have defined structures in isolation, and hydrogen bonds must be made to the buried polar groups in the peptide backbone1–3. Here, inspired by natural and re-engineered protein–peptide systems4–11, we set out to design proteins made out of repeating units that bind peptides with repeating sequences, with a one-to-one correspondence between the repeat units of the protein and those of the peptide. We use geometric hashing to identify protein backbones and peptide-docking arrangements that are compatible with bidentate hydrogen bonds between the side chains of the protein and the peptide backbone12. The remainder of the protein sequence is then optimized for folding and peptide binding. We design repeat proteins to bind to six different tripeptide-repeat sequences in polyproline II conformations. The proteins are hyperstable and bind to four to six tandem repeats of their tripeptide targets with nanomolar to picomolar affinities in vitro and in living cells. Crystal structures reveal repeating interactions between protein and peptide interactions as designed, including ladders of hydrogen bonds from protein side chains to peptide backbones. By redesigning the binding interfaces of individual repeat units, specificity can be achieved for non-repeating peptide sequences and for disordered regions of native proteins.
General approaches for designing sequence-specific peptide binding proteins would have wide utility in proteomics and synthetic biology. Although considerable progress has been made in designing proteins which bind to other proteins, the general peptide binding problem is more challenging as most peptides do not have defined structures in isolation, and to offset the loss in solvation upon binding the protein binding interface has to provide specific hydrogen bonds that complement the majority of the buried peptide backbone polar groups. Inspired by natural repeat protein-peptide complexes, and engineering efforts to alter their specificity, we describe a general approach for de novo design of proteins made out of repeating units that bind peptides with repeating sequences such that there is a one to one correspondence between repeat units on the protein and peptide. We develop a rapid docking plus geometric hashing method to identify protein backbones and protein-peptide rigid body arrangements that are compatible with bidentate hydrogen bonds between side chains on the protein and the backbone of the peptide; the remainder of the protein sequence is then designed using Rosetta to incorporate additional interactions with the peptide and drive folding to the desired structure. We use this approach to design, from scratch, alpha helical repeat proteins that bind six different tripeptide repeat sequences--PLP, LRP, PEW, IYP, PRM and PKW-- in near polyproline 2 helical conformations. The proteins are expressed at high levels in E. coli, are hyperstable, and bind peptides with 4-6 copies of the target tripeptide sequences with nanomolar to picomolar affinities both in vitro and in living cells. Crystal structures reveal repeating interactions between protein and peptide interactions as designed, including a ladder of protein side chain to peptide backbone hydrogen bonds. By redesigning the binding interfaces of individual repeat units, specificity can be achieved for non-repeating sequences, and for naturally occurring proteins containing disordered regions. Our approach provides a general route to designing specific binding proteins for a broad range of repeating and non-repetitive peptide sequences.
Three-dimensional chromosomal structure plays an important role in gene regulation. Chromosome conformation capture techniques, especially the high-throughput, sequencing-based technique Hi-C, provide new insights on spatial architectures of chromosomes. However, Hi-C data contains artifacts and systemic biases that substantially influence subsequent analysis. Computational models have been developed to address these biases explicitly, however, it is difficult to enumerate and eliminate all the biases in models. Other models are designed to correct biases implicitly, but they will also be invalid in some situations such as copy number variations. We characterize a new kind of artifact in Hi-C data. We find that this artifact is caused by incorrect alignment of Hi-C reads against approximate repeat regions and can lead to erroneous chromatin contact signals. The artifact cannot be corrected by current Hi-C correction methods. We design a probabilistic method and develop a new Hi-C processing pipeline by integrating our probabilistic method with the HiC-Pro pipeline. We find that the new pipeline can remove this new artifact effectively, while preserving important features of the original Hi-C matrices.
Topological data analysis (TDA) is a mathematically well-founded set of methods to derive robust information about the structure and topology of data sets, and has been applied successfully in several biological contexts. Derived primarily from algebraic topology, TDA rigorously identifies persistent features in complex data, making it well-suited to better understand the key features of threedimensional chromosome structure. Chromosome structure has a significant influence in many diverse genomic processes and has recently been shown to relate to cellular differentiation. While there exist many methods to study specific substructures of chromosomes, we are still missing a global view of all geometric features of chromosomes. By applying TDA to the study of chromosome structure through differentiation across three cell lines, we provide insight into principles of chromosome folding and looping. We identify persistent connected components and one-dimensional topological features of chromosomes, and characterize them across cell types and stages of differentiation. Availability: Scripts to reproduce the results from this study can be found at https://github.com/Kingsford-Group/hictda
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.