Mutations, which result in amino acid substitutions, influence the stability of proteins and their binding to biomolecules. A molecular understanding of the effects of protein mutations is both of biotechnological and medical relevance. Empirical free energy functions that quickly estimate the free energy change upon mutation (ΔΔG) can be exploited for systematic screenings of proteins and protein complexes. In silico saturation mutagenesis can guide the design of new experiments or rationalize the consequences of known mutations. Often software such as FoldX, while fast and reliable, lack the necessary automation features to apply them in a high-throughput manner. We introduce MutateX, a software to automate the prediction of ΔΔGs associated with the systematic mutation of each residue within a protein, or protein complex to all other possible residue types, using the FoldX energy function. MutateX also supports ΔΔG calculations over protein ensembles, upon post-translational modifications and in multimeric assemblies. At the heart of MutateX lies an automated pipeline engine that handles input preparation, parallelization and outputs publication-ready figures. We illustrate the MutateX protocol applied to different case studies. The results of the high-throughput scan provided by our tools can help in different applications, such as the analysis of disease-associated mutations, to complement experimental deep mutational scans, or assist the design of variants for industrial applications. MutateX is a collection of Python tools that relies on open-source libraries. It is available free of charge under the GNU General Public License from https://github.com/ELELAB/mutatex.
Cancer is a complex group of diseases due to the accumulation of mutations in tumor suppressors or oncogenes in the genome. Cancer alterations can be very heterogeneous, even in tumors from the same tissue, affecting the response to treatment and risks of relapse in different patients. The role of genomics variants on cancer predisposition, progression, and response to treatment continues to be realized. Thanks to advances in sequencing techniques and their introduction in a clinical setting, the number of genomic variants discovered is growing exponentially. Many of these variants are classified as Variants of Uncertain Significance (VUS), while other variants have been reported with conflicting evidence. Applications of bioinformatic-based approaches to characterize the effects of these variants demonstrated their full potential thanks to advances in machine learning, comparisons between predicted effects and cellular readouts, and advances in the field of structural biology and biomolecular simulations. We here introduce a modular structure-based framework for the annotations and classification of the impact of variants affecting the coding region of genes and impacting on the corresponding protein product (MAVISp, Multi-layered Assessment of VarIants by Structure for proteins) together with a Streamlit-based web application (https://github.com/ELELAB/MAVISp) where the variants and the data generated by the assessment are made available to the community for consultation or further studies. Currently, MAVISp includes information for ten different proteins and more than 4000 variants. New protein targets are routinely analyzed in batches through standardized Python-based workflows and high-throughput free energy and biomolecular simulations. We also illustrate the potential of the approach for each protein included in the database. New variants will be deposited on a regular base or in connection with future publications where the approach will be applied. Finally, we provide guidelines for new contributors who are interested in contributing to the collection in relation to their research.
Reliable prediction of free energy changes upon amino acid substitutions (ΔΔGs) is crucial to investigate their impact on protein stability and proteinprotein interaction. Advances in experimental mutational scans allow highthroughput studies thanks to multiplex techniques. On the other hand, genomics initiatives provide a large amount of data on disease-related variants that can benefit from analyses with structure-based methods. Therefore, the computational field should keep the same pace and provide new tools for fast and accurate high-throughput ΔΔG calculations. In this context, the Rosetta modeling suite implements effective approaches to predict folding/unfolding ΔΔGs in a protein monomer upon amino acid substitutions and calculate the changes in binding free energy in protein complexes. However, their application can be challenging to users without extensive experience with Rosetta. Furthermore, Rosetta protocols for ΔΔG prediction are designed considering one variant at a time, making the setup of high-throughput screenings cumbersome. For these reasons, we devised RosettaDDGPrediction, a customizable Python wrapper designed to run free energy calculations on a set of amino acid substitutions using Rosetta protocols with little intervention from the user. Moreover, RosettaDDGPrediction assists with checking completed runs and aggregates raw data for multiple variants, as well as generates publication-Valentina Sora and Adrian Otamendi Laspiur equally contributed to this study.
There has been increasing interest in the role of T cells and their involvement in cancer, autoimmune and infectious diseases. However, the nature of T cell receptor (TCR) epitope recognition at a repertoire level is not yet fully understood. Due to technological advances a plethora of TCR sequences from a variety of disease and treatment settings has become readily available. Current efforts in TCR specificity analysis focus on identifying characteristics in immune repertoires which can explain or predict disease outcome or progression, or can be used to monitor the efficacy of disease therapy. In this context, clustering of TCRs by sequence to reflect biological similarity, and especially to reflect antigen specificity have become of paramount importance. We review the main TCR sequence clustering methods and the different similarity measures they use, and discuss their performance and possible improvement. We aim to provide guidance for non-specialists who wish to use TCR repertoire sequencing for disease tracking, patient stratification or therapy prediction, and to provide a starting point for those aiming to develop novel techniques for TCR annotation through clustering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.