2017
DOI: 10.1038/s41598-017-15635-8
|View full text |Cite
|
Sign up to set email alerts
|

Random protein sequences can form defined secondary structures and are well-tolerated in vivo

Abstract: The protein sequences found in nature represent a tiny fraction of the potential sequences that could be constructed from the 20-amino-acid alphabet. To help define the properties that shaped proteins to stand out from the space of possible alternatives, we conducted a systematic computational and experimental exploration of random (unevolved) sequences in comparison with biological proteins. In our study, combinations of secondary structure, disorder, and aggregation predictions are accompanied by experimenta… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

12
96
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 72 publications
(108 citation statements)
references
References 45 publications
(41 reference statements)
12
96
0
Order By: Relevance
“…The translated rORFs of ambigrammatic narnaviruses are predicted to have median α-helical and β -strand contents of 22 % and 12 % respectively (calculated using JPred4, 35 ). This degree of secondary structure is consistent with a structured (or folded) protein, but a significant presence of secondary structure can also be observed in random amino-acid sequences 36 . We further note that the isoelectric point (PI) of the RdRp is high (median 10.4, range 7.7-11.6 for sequences >2 kb in Figure 3), due to a high frequency of Arg, a basic amino acid (median 9.9 %, range 6-13.3 %).…”
Section: Exploration Of the Possible Secondary Structure In The Rorfsupporting
confidence: 71%
“…The translated rORFs of ambigrammatic narnaviruses are predicted to have median α-helical and β -strand contents of 22 % and 12 % respectively (calculated using JPred4, 35 ). This degree of secondary structure is consistent with a structured (or folded) protein, but a significant presence of secondary structure can also be observed in random amino-acid sequences 36 . We further note that the isoelectric point (PI) of the RdRp is high (median 10.4, range 7.7-11.6 for sequences >2 kb in Figure 3), due to a high frequency of Arg, a basic amino acid (median 9.9 %, range 6-13.3 %).…”
Section: Exploration Of the Possible Secondary Structure In The Rorfsupporting
confidence: 71%
“…In another study of random sequence proteins of a similar length to those described here, Tretyachenko et al. reported that 53 % of random 20 AA sequences with similar amino acid usage to modern proteins could be expressed in E. coli (8/15) . Finally, a 20 AA random library of proteins 95 amino acids in length created by Urabe and colleagues contained only 20 % solubly expressed variants (as detected by western blot) .…”
Section: Discussionmentioning
confidence: 54%
“…Young genes are known to have higher ISD than old genes, with high ISD at the moment of gene birth facilitating the process [52], perhaps because cells tolerate them better [48]. Domains that were more recently born de novo also have higher ISD [3], [5], [14], [26].…”
Section: Introductionmentioning
confidence: 99%