Abstract:BackgroundAlthough most genes in mammalian genomes have multiple isoforms, an ongoing debate is whether these isoforms are all functional as well as the extent to which they increase the functional repertoire of the genome. To ground this debate in data, it would be helpful to have a corpus of experimentally-verified cases of genes which have functionally distinct splice isoforms (FDSIs).ResultsWe established a curation framework for evaluating experimental evidence of FDSIs, and analyzed over 700 human and mo… Show more
“…While the uncertainty about the number of genes strongly decreased in the past 20 years (compare Figure and Figure ), the number of generated transcripts remains largely unclear (Figure B). It steadily increased in RefSeq, GENCODE, and other databases over the past ten years, from about 60 000 in 2009 to 210 000 in 2018, but it is not clear yet, to which extent these transcripts result from erroneous splicing or are translated to a significant level, if at all …”
Section: Is There Consensus On the Low‐hanging Fruits?mentioning
The major transcript variants of human protein-coding genes are annotated to a certain degree of accuracy combining manual curation, transcript data, and proteomics evidence. However, there is considerable disagreement on the annotation of about 2000 genes-they can be protein-coding, noncoding, or pseudogenes-and on the annotation of most of the predicted alternative transcripts. Pure transcriptome mapping approaches seem to be limited in discriminating functional expression from noise. These limitations have partially been overcome by dedicated algorithms to detect alternative spliced micro-exons and wobble splice variants. Recently, knowledge about splice mechanism and protein structure are incorporated into an algorithm to predict neighboring homologous exons, often spliced in a mutually exclusive manner. Predicted exons are evaluated by transcript data, structural compatibility, and evolutionary conservation, revealing hundreds of novel coding exons and splice mechanism re-assignments. The emerging human pan-genome is necessitating distinctive annotations incorporating differences between individuals and between populations.
“…While the uncertainty about the number of genes strongly decreased in the past 20 years (compare Figure and Figure ), the number of generated transcripts remains largely unclear (Figure B). It steadily increased in RefSeq, GENCODE, and other databases over the past ten years, from about 60 000 in 2009 to 210 000 in 2018, but it is not clear yet, to which extent these transcripts result from erroneous splicing or are translated to a significant level, if at all …”
Section: Is There Consensus On the Low‐hanging Fruits?mentioning
The major transcript variants of human protein-coding genes are annotated to a certain degree of accuracy combining manual curation, transcript data, and proteomics evidence. However, there is considerable disagreement on the annotation of about 2000 genes-they can be protein-coding, noncoding, or pseudogenes-and on the annotation of most of the predicted alternative transcripts. Pure transcriptome mapping approaches seem to be limited in discriminating functional expression from noise. These limitations have partially been overcome by dedicated algorithms to detect alternative spliced micro-exons and wobble splice variants. Recently, knowledge about splice mechanism and protein structure are incorporated into an algorithm to predict neighboring homologous exons, often spliced in a mutually exclusive manner. Predicted exons are evaluated by transcript data, structural compatibility, and evolutionary conservation, revealing hundreds of novel coding exons and splice mechanism re-assignments. The emerging human pan-genome is necessitating distinctive annotations incorporating differences between individuals and between populations.
“…research papers [13,14], but there are no large-scale analyses of tissue specificity at the protein level. One reason for this is that proteomics experiments detect many fewer alternative isoforms than would be expected [15,16], It is not clear why it is so hard to detect alternative protein isoforms.…”
Section: Introductionmentioning
confidence: 99%
“…Little research has been carried out into tissue-specific alternative splicing at the protein level. Examples of protein level tissue specificity have been highlighted in analyses of individual research papers [ 13 , 14 ] , but there are no large-scale analyses of tissue specificity at the protein level. One reason for this is that proteomics experiments detect many fewer alternative isoforms than would be expected [ 15 , 16 ] , It is not clear why it is so hard to detect alternative protein isoforms.…”
The role of alternative splicing is one of the great unanswered questions in cellular biology. There is strong evidence for alternative splicing at the transcript level, and transcriptomics experiments show that many splice events are tissue specific. It has been suggested that alternative splicing evolved in order to remodel tissue-specific protein-protein networks. Here we investigated the evidence for tissue-specific splicing among splice isoforms detected in a large-scale proteomics analysis. Although the data supporting alternative splicing is limited at the protein level, clear patterns emerged among the small numbers of alternative splice events that we could detect in the proteomics data. More than a third of these splice events were tissue-specific and most were ancient: over 95% of splice events that were tissue-specific in both proteomics and RNAseq analyses evolved prior to the ancestors of lobe-finned fish, at least 400 million years ago. By way of contrast, three in four alternative exons in the human gene set arose in the primate lineage, so our results cannot be extrapolated to the whole genome. Tissue-specific alternative protein forms in the proteomics analysis were particularly abundant in nervous and muscle tissues and their genes had roles related to the cytoskeleton and either the structure of muscle fibres or cell-cell connections. Our results suggest that this conserved tissue-specific alternative splicing may have played a role in the development of the vertebrate brain and heart.
“…In particular, there has been a general debate as to whether protein isoforms encoded by the multiple splice forms of a particular gene are produced and functional. One extreme, based on evidence from high throughput mass-spectrometry or literature curation of verified proteoforms, contends that most genes encoding multiple alternatively spliced isoforms only produce a single functional proteoform [34,35]. In contrast, an alternative view is that alternatively spliced isoforms generate proteoforms with functionally distinct properties in terms of spatial or temporal expression, or their interaction repertoires [36,37].…”
The Drosophila shaggy gene (sgg, GSK-3) encodes multiple protein isoforms with serine/ threonine kinase activity and is a key player in diverse developmental signalling pathways. Currently it is unclear whether different Sgg proteoforms are similarly involved in signalling or if different proteoforms have distinct functions. We used CRISPR/Cas9 genome engineering to tag eight different Sgg proteoform classes and determined their localization during embryonic development. We performed proteomic analysis of the two major proteoform classes and generated mutant lines for both of these for transcriptomic and phenotypic analysis. We uncovered distinct tissue-specific localization patterns for all of the tagged proteoforms we examined, most of which have not previously been characterised directly at the protein level, including one proteoform initiating with a non-standard codon. Collectively, this suggests complex developmentally regulated splicing of the sgg primary transcript. Further, affinity purification followed by mass spectrometric analyses indicate a different repertoire of interacting proteins for the two major proteoforms we examined, one with ubiquitous expression (Sgg-PB) and one with nervous system specific expression (Sgg-PA). Specific mutation of these proteoforms shows that Sgg-PB performs the well characterised maternal and zygotic segmentations functions of the sgg locus, while Sgg-PA mutants show adult lifespan and locomotor defects consistent with its nervous system localisation. Our findings provide new insights into the role of GSK-3 proteoforms and intriguing links with the GSK-3α and GSK-3β proteins encoded by independent vertebrate genes. Our analysis suggests that different proteoforms generated by alternative splicing are likely to perform distinct functions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.