RNA has become an integral building material in synthetic biology. Dominated by their secondary structures, which can be computed efficiently, RNA molecules are amenable not only to in vitro and in vivo selection, but also to rational, computation-based design. While the inverse folding problem of constructing an RNA sequence with a prescribed ground-state structure has received considerable attention for nearly two decades, there have been few efforts to design RNAs that can switch between distinct prescribed conformations. We introduce a user-friendly tool for designing RNA sequences that fold into multiple target structures. The underlying algorithm makes use of a combination of graph coloring and heuristic local optimization to find sequences whose energy landscapes are dominated by the prescribed conformations. A flexible interface allows the specification of a wide range of design goals. We demonstrate that bi- and tri-stable "switches" can be designed easily with moderate computational effort for the vast majority of compatible combinations of desired target structures. RNAdesign is freely available under the GPL-v3 license.
Understanding the relationship between protein sequence and structure is one of the great challenges in biology. In the case of the ubiquitous coiled-coil motif, structure and occurrence have been described in extensive detail, but there is a lack of insight into the rules that govern oligomerization, i.e. how many α-helices form a given coiled coil. To shed new light on the formation of two- and three-stranded coiled coils, we developed a machine learning approach to identify rules in the form of weighted amino acid patterns. These rules form the basis of our classification tool, PrOCoil, which also visualizes the contribution of each individual amino acid to the overall oligomeric tendency of a given coiled-coil sequence. We discovered that sequence positions previously thought irrelevant to direct coiled-coil interaction have an undeniable impact on stoichiometry. Our rules also demystify the oligomerization behavior of the yeast transcription factor GCN4, which can now be described as a hybrid—part dimer and part trimer—with both theoretical and experimental justification.
Support vector machines (SVMs) are well-established standard methods for classifying biological sequences. Advantages of SVMs [2,8]:• Maximizing the margin between two classes → proven to be a near-optimal learning strategy.• Optimization problem is convex and quadratic → global solution exists and can be found efficiently.• Only depend on very few hyperparameters → easier model selection.• Can be applied to any kind of data; all needed is a meaningful positive semi-definite comparison measure (the so-called kernel) → great advantage for sequences (cannot always be cast into vectorial data)SVMs in a Nutshell. Consider training data {(x i , y i ) | i =1,…,l}, where x i are sequences and y i ∈ {-1,+1} are binary labels. Discriminant function of SVM:x: new data item to be classified; α i : weights determined by SVM training (Lagrange multipliers); k(.,.): kernel function.Sequence Kernels. Wide range available [9], many of which can be expressed as [1] P: set of sequence patterns; N(p,x): number of occurrences/matches of pattern p in sequence x. This formulation includes the well-known spectrum kernel [6], the mismatch kernel [5], and the spatial sample kernel [4]. To correct for varying sequence lengths, it is often useful to normalize the kernel [9]:
Abstract*Overview* ∣ Coiled coils are usually described as consisting of two up to seven α-helices that are wrapped around each other. They can associate as either homomeric or heteromeric structures and bind in parallel or antiparallel topologies. Another characteristic of all coiled coils is the periodic recurrence of a sequence [abcdefg]n called heptad repeat, where n denotes the heptad number. In these repeats, a and d are hydrophobic amino acids at core positions crucial for the tertiary structure. In contrast, the polar positions b, c, and f are hydrophilic and e and g are charged residues.Due to their ability to oligomerize, coiled coils are involved in a variety of important cellular functions, either on their own or as part of larger protein complexes. Hence, they are in the focus of current research, for instance, as potential oncogenes and in the context of viral fusion proteins. Since structure and occurrence are well known, it might stand to reason that we have a clearly drawn picture of coiled coils. Most remarkably, however, the complex rules that govern oligomer formation, and thus the key to biological function, are poorly understood.*Approach* ∣ To find rules that determine oligomerization, we applied support vector machines and statistical methods to classify dimers and trimers on the basis of their amino acid sequences. The data set for this classification task was collected by searching the entire RCSB Protein Data Base for coiled coil structures, extracting the according amino acid sequences, and sorting them into types of oligomers based on properties of their 3D structures. We then extracted important features in the form of specific amino acids at certain key positions or amino acid patterns that are characteristic for each type of oligomer. Amino acid patterns were retrieved from our new coiled coil kernel, which detects amino acid co-occurrences. The relevance of the selected patterns was ensured by statistical tests and the excellent classification results measured by cross-validation.*Results* ∣ We discovered that a complex network of amino acid dependencies and sequence positions previously thought irrelevant to direct coiled coil interaction have an undeniable impact on stoichiometry. Our online-tool PrOCoil, classifies coiled coils with an outstanding accuracy of 86% and is also able to visualize the contribution of each individual amino acid to the overall oligomeric tendency of a given coiled coil sequence. A "web version":http://www.bioinf.jku.at/software/procoil/ and an "R package":http://www.bioinf.jku.at/software/procoil/procoilR.html of our prediction and profiling software (PrOCoil) are available to the scientific community.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.