Self-assembling proteins are critical to biological systems and industrial technologies, but predicting how mutations affect self-assembly remains a significant challenge. Here, we report a technique, termed SyMAPS (Systematic Mutation and Assembled Particle Selection), that can be used to characterize the assembly competency of all single amino acid variants of a self-assembling viral structural protein. SyMAPS studies on the MS2 bacteriophage coat protein revealed a high-resolution fitness landscape that challenges some conventional assumptions of protein engineering. An additional round of selection identified a previously unknown variant (CP[T71H]) that is stable at neutral pH but less tolerant to acidic conditions than the wild-type coat protein. The capsids formed by this variant could be more amenable to disassembly in late endosomes or early lysosomes—a feature that is advantageous for delivery applications. In addition to providing a mutability blueprint for virus-like particles, SyMAPS can be readily applied to other self-assembling proteins.
Protein evolution occurs via restricted evolutionary paths that are influenced by both previous and subsequent mutations. This effect, termed epistasis, is critical in population genetics, drug resistance, and immune escape; however, the effect of epistasis on the level of protein fitness is less well characterized. We generated and characterized a 6615-member library of all two-amino acid combinations in a highly mutable loop of a virus-like particle. This particle is a model of protein selfassembly and a promising vehicle for drug delivery and imaging. In addition to characterizing the effect of all double mutants on assembly, thermostability, and acid stability, we observed many instances of epistasis, in which combinations of mutations are either more deleterious or more beneficial than expected. These results were used to generate rules governing the effects of multiple mutations on the self-assembly of the virus-like particle.
pellet was left to air-dry for 30 min. The dry DNA pellet was resuspended in Buffer EB (Qiagen), and submitted for sequencing. Processing and analysis of next-generation sequencing data. Genomes submitted for sequencing corresponded to the wild-type or mutant samples from generations 0 (wild type), 10, 20, and 30 of the initial T3 mutagenesis series, and generations 0 (wild type), 5, 10, and 15 of the T3/T7 parallel evolution series (12 samples total). The purified genomes were sequenced by QuickBiology Inc. (HiSeq X, 5M reads total per sample, 2.5M pairs, 2 × 150 PE), and raw data for reads aligned to the wild type reference genome were provided in the form of fastq.gz files. The fastq data for the mutant genome pools were processed using the following pipeline:
This study analyzes and adds to the Low-N protein engineering with data-efficient deep learning work done by Biswas et al.We provide a complete, open-source, end-to-end re-implementation of the in silico protein engineering pipeline with improved computational efficiency, more detailed documentation, cleaner API and additional features to lower the barrier to entry for use of this pipeline as an engineering tool. We additionally perform a more thorough evaluation of the success and necessity of each step in the pipeline for in silico directed evolution, by re-implementing select portions of the study of TEM-1 β-lactamase, as well as applying the full in silico pipeline to two novel protein engineering tasks -increasing the melting temperature of plastic degrading enzyme IsPETase and improving the thermostability the MS2 bacteriophage's capsid protein. By comparing the performance of various UniRep-based feature representations we provide proof that linear kernels can be equivalent to additive fitness landscapes and outperform more complex models on small or simple mutation prediction tasks. This is assumed in many previous works but never explicitly shown. We believe it helps to elucidate the main strength of the eUniRep representation: its ability to overcome epistatic effects in proposing extensively mutated candidate sequences with optimized functionality.-
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.