of RBPs for which we obtained a motif bound to short linear sequences, whereas 36 ~30% preferred structured motifs folding into stem-loops. We also found that 37 many RBPs can bind to multiple distinctly different motifs. Analysis of the matches 38 of the motifs on human genomic sequences suggested novel roles for many RBPs in 39 regulation of splicing, and revealed RBPs that are likely to control specific classes 40 of transcripts. Global analysis of the motifs also revealed an enrichment of G and U 41 nucleotides. Masking of G and U by proteins increases the specificity of RNA folding, 42 as both G and U can pair to two other RNA bases via canonical Watson-Crick or G-U 43 base pairs. The collection containing 145 high resolution binding specificity 44 models for 86 RBPs is the largest systematic resource for the analysis of human 45RBPs, and will greatly facilitate future analysis of the various biological roles of this 46 important class of proteins.
48All rights reserved. No reuse allowed without permission.(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint . http://dx.doi.org/10.1101/317909 doi: bioRxiv preprint first posted online May. 9, 2018; 2 INTRODUCTION 49 50The abundance of protein and RNA molecules in a cell depends both on their rates 51 of production and degradation. These rates are determined directly or indirectly by the 52 sequence of DNA. The transcription rate of RNA and the rate of degradation of proteins is 53 determined by DNA and protein sequences, respectively. However, most regulatory steps 54 that control gene expression are influenced by the sequence of the RNA itself. These 55 processes include RNA splicing, localization, stability, and translation. These processes 56 can be affected by RNA-binding proteins (RBPs) that specifically recognize short RNA 57 sequence elements (Glisovic et al., 2008). 58RBPs can recognize their target sites using two mechanisms: they can form direct 59 contacts to the RNA bases of an unfolded RNA chain, and/or recognise folded RNA-60 structures (Loughlin et al., 2009 256 oligonucleotides, and the desired RBP is then used to select its target sites followed 95 by detection of the bound sites using a second microarray. RNAcompete has been used to 96All rights reserved. No reuse allowed without permission.(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. given protein with 15 amino-acids of flanking sequence (see Table S1 for details). 127Constructs containing subsets of RBDs were also analyzed for some very large RBPs. 128Taken together our clone collection covered 942 distinct proteins. The RBPs were 129 expressed in E.coli as fusion proteins with thioredoxin, incorporating an N-terminal 13...