Nucleotide variants can cause functional changes by altering protein–RNA binding in various ways that are not easy to predict. This can affect processes such as splicing, nuclear shuttling, and stability of the transcript. Therefore, correct modeling of protein–RNA binding is critical when predicting the effects of sequence variations. Many RNA-binding proteins recognize a diverse set of motifs and binding is typically also dependent on the genomic context, making this task particularly challenging. Here, we present DeepCLIP, the first method for context-aware modeling and predicting protein binding to RNA nucleic acids using exclusively sequence data as input. We show that DeepCLIP outperforms existing methods for modeling RNA-protein binding. Importantly, we demonstrate that DeepCLIP predictions correlate with the functional outcomes of nucleotide variants in independent wet lab experiments. Furthermore, we show how DeepCLIP binding profiles can be used in the design of therapeutically relevant antisense oligonucleotides, and to uncover possible position-dependent regulation in a tissue-specific manner. DeepCLIP is freely available as a stand-alone application and as a webtool at http://deepclip.compbio.sdu.dk.
It is now widely accepted that aberrant splicing of constitutive exons is often caused by mutations affecting cis‐acting splicing regulatory elements, but there is a misconception that all exons have an equal dependency on splicing regulatory elements and thus a similar susceptibility to aberrant splicing. We investigated exonic mutations in ACADM exon 5 to experimentally examine their effect on splicing and found that 7 out of 11 tested mutations affected exon inclusion, demonstrating that this constitutive exon is particularly vulnerable to exonic splicing mutations. Employing ACADM exon 5 and 6 as models, we demonstrate that the balance between splicing enhancers and silencers, flanking intron length, and flanking splice site strength are important factors that determine exon definition and splicing efficiency of the exon in question. Our study shows that two constitutive exons in ACADM have different inherent vulnerabilities to exonic splicing mutations. This suggests that in silico prediction of potential pathogenic effects on splicing from exonic mutations may be improved by also considering the inherent vulnerability of the exon. Moreover, we show that single nucleotide polymorphism that affect either of two different exonic splicing silencers, located far apart in exon 5, all protect against both immediately flanking and more distant exonic splicing mutations.
Nucleotide variants can cause functional changes by altering protein-RNA binding in various and subtle ways that are not easy to predict. This can affect processes such as splicing, nuclear shuttling, and stability of the transcript. Therefore, correct modelling of protein-RNA binding is critical when predicting the effects of sequence variations. Many RNA-binding proteins recognize a diverse set of motifs and binding is typically also dependent on the genomic context, making this task particularly challenging. Although existing protein binding site models incorporate various additional data sources to incorporate context, such as RNA structure and functional gene context, they still need improvement and they have not been developed to predict the effect of sequence variants. Here, we present DeepCLIP, the first method for context-aware modeling and predicting protein binding to nucleic acids using exclusively sequence data as input. We show that DeepCLIP outperforms existing methods for modelling RNA-protein binding. Importantly, we demonstrate that DeepCLIP is able to reliably predict the functional effects of contextually dependent nucleotide variants in independent wet lab experiments. Furthermore, we show how DeepCLIP binding profiles can be used in the design of therapeutically relevant antisense oligonucleotides, and to uncover possible position-dependent regulation in a tissue-specific manner. DeepCLIP can be freely used at http://deepclip.compbio.sdu.dk. Highlights We have designed DeepCLIP as a simple neural network that requires only CLIP binding sites as input. The architecture and parameter settings of DeepCLIP makes it an efficient classifier and robust to train, making high performing models easy to train and recreate. Using an extensive benchmark dataset, we demonstrate that DeepCLIP outperforms existing tools in classification. Furthermore, DeepCLIP provides direct information about the neural network's decision process through visualization of binding motifs and a binding profile that directly indicates sequence elements contributing to the classification. To show that DeepCLIP models generalize to different datasets we have demonstrated that predictions correlate with in vivo and in vitro experiments using quantitative binding assays and minigenes. Identifying the binding sites for regulatory RNA-binding proteins is fundamental for efficient design of (therapeutic) antisense oligonucleotides. Employing a reported disease associated mutation, we demonstrate that DeepCLIP can be used for design of therapeutic antisense oligonucleotides that block regions important for binding of regulatory proteins and correct aberrant splicing. Using DeepCLIP binding profiles, we uncovered a possible position-dependent mechanism behind the reported tissue-specificity of a group of TDP-43 repressed pseudoexons. We have made DeepCLIP available as an online tool for training and application of protein-RNA binding deep learning models and prediction of the potential effects of clinically detected sequence vari...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.