While deep learning has revolutionized protein structure prediction, almost all experimentally characterized de novo protein designs have been generated using physically based approaches such as Rosetta. Here we describe a deep learning–based protein sequence design method, ProteinMPNN, with outstanding performance in both in silico and experimental tests. On native protein backbones, ProteinMPNN has a sequence recovery of 52.4%, compared to 32.9% for Rosetta. The amino acid sequence at different positions can be coupled between single or multiple chains, enabling application to a wide range of current protein design challenges. We demonstrate the broad utility and high accuracy of ProteinMPNN using X-ray crystallography, cryoEM and functional studies by rescuing previously failed designs, made using Rosetta or AlphaFold, of protein monomers, cyclic homo-oligomers, tetrahedral nanoparticles, and target binding proteins.
While deep learning has revolutionized protein structure prediction, almost all experimentally characterized de novo protein designs have been generated using physically based approaches such as Rosetta. Here we describe a deep learning based protein sequence design method, ProteinMPNN, with outstanding performance in both in silico and experimental tests. The amino acid sequence at different positions can be coupled between single or multiple chains, enabling application to a wide range of current protein design challenges. On native protein backbones, ProteinMPNN has a sequence recovery of 52.4%, compared to 32.9% for Rosetta. Incorporation of noise during training improves sequence recovery on protein structure models, and produces sequences which more robustly encode their structures as assessed using structure prediction algorithms. We demonstrate the broad utility and high accuracy of ProteinMPNN using X-ray crystallography, cryoEM and functional studies by rescuing previously failed designs, made using Rosetta or AlphaFold, of protein monomers, cyclic homo-oligomers, tetrahedral nanoparticles, and target binding proteins.
RNA has enormous potential as a therapeutic, yet, the successful application depends on efficient delivery strategies. In this study, we demonstrate that a designed artificial viral coat protein, which self-assembles with DNA to form rod-shaped virus-like particles (VLPs), also encapsulates and protects mRNA encoding enhanced green fluorescent protein (EGFP) and luciferase, and yields cellular expression of these proteins. The artificial viral coat protein consists of an oligolysine (K) for binding to the oligonucleotide, a silk protein-like midblock S = (GAGAGAGQ) that self-assembles into stiff rods, and a long hydrophilic random coil block C that shields the nucleic acid cargo from its environment. With mRNA, the C-S-K protein coassembles to form rod-shaped VLPs each encapsulating about one to five mRNA molecules. Inside the rod-shaped VLPs, the mRNAs are protected against degradation by RNAses, and VLPs also maintain their shape following incubation with serum. Despite the lack of cationic surface charge, the mRNA VLPs transfect cells with both EGFP and luciferase, although with a much lower efficiency than obtained by a lipoplex transfection reagent. The VLPs have a negligible toxicity and minimal hemolytic activity. Our results demonstrate that VLPs yield efficient packaging and shielding of mRNA and create the basis for implementation of additional virus-like functionalities to improve transfection and cell specificity, such as targeting functionalities.
We propose to exploit multivalent binding of solid-binding peptides (SBPs) for the physical attachment of antifouling polypeptide brushes on solid surfaces. Using a silica-binding peptide as a model SBP, we find that both tandem-repeated SBPs and SBPs repeated in branched architectures implemented via a multimerization domain work very well to improve the binding strength of polypeptide brushes, as compared to earlier designs with a single SBP. At the same time, for many of the designed sequences, either the solubility or the yield of recombinant production is low. For a single design, with the domain structure B - M - E , both solubility and yield of recombinant production were high. In this design, B is a silica-binding peptide, M is a highly thermostable, de novo-designed trimerization domain, and E is a hydrophilic elastin-like polypeptide. We show that the B - M - E triblock polypeptide rapidly assembles into highly stable polypeptide brushes on silica surfaces, with excellent antifouling properties against high concentrations of serum albumin. Given that SBPs attaching to a wide range of materials have been identified, the B - M - E triblock design provides a template for the development of polypeptides for coating many other materials such as metals or plastics.
The design of novel protein-protein interfaces using physics-based design methods such as Rosetta requires substantial computational resources and manual refinement by expert structural biologists. A new generation of deep learning methods promises to simplify protein-protein interface design and enable its application to a wide variety of problems by researchers from various scientific disciplines. Here we test the ability of a deep learning method for protein sequence design, ProteinMPNN, to design two-component tetrahedral protein nanomaterials and benchmark its performance against Rosetta. ProteinMPNN had a similar success rate to Rosetta, yielding 13 new experimentally confirmed assemblies, but required orders of magnitude less computation and no manual refinement. The interfaces designed by ProteinMPNN were substantially more polar than those designed by Rosetta, which facilitated in vitro assembly of the designed nanomaterials from independently purified components. Crystal structures of several of the assemblies confirmed the accuracy of the design method at high resolution. Our results showcase the potential of deep learning-based methods to unlock the widespread application of designed protein-protein interfaces and self-assembling protein nanomaterials in biotechnology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.