Although over 40 type 1 diabetes (T1D) risk loci have been mapped in humans, the causative genes and variants for T1D are largely unknown. Here, we investigated a candidate gene in the 21q22.3 risk locus—UBASH3A, which is primarily expressed in T cells where it is thought to play a largely redundant role. Genetic variants in UBASH3A have been shown to be associated with several autoimmune diseases in addition to T1D. However, the molecular mechanism underlying these genetic associations is unresolved. Our study reveals a previously unrecognized role of UBASH3A in human T cells: UBASH3A attenuates the NF-κB signal transduction upon T-cell receptor (TCR) stimulation by specifically suppressing the activation of the IκB kinase complex. We identify novel interactions of UBASH3A with nondegradative polyubiquitin chains, TAK1 and NEMO, suggesting that UBASH3A regulates the NF-κB signaling pathway by an ubiquitin-dependent mechanism. Finally, we show that risk alleles at rs11203203 and rs80054410, two T1D-associated variants in UBASH3A, increase UBASH3A expression in human primary CD4+ T cells upon TCR stimulation, inhibiting NF-κB signaling via its effects on the IκB kinase complex and resulting in reduced IL2 gene expression.
Recent advances in long-read sequencing solve inaccuracies in alternative transcript identification of full-length transcripts in short-read RNA-Seq data, which encourages the development of methods for isoform-centered functional analysis. Here, we present tappAS, the first framework to enable a comprehensive Functional Iso-Transcriptomics (FIT) analysis, which is effective at revealing the functional impact of context-specific post-transcriptional regulation. tappAS uses isoform-resolved annotation of coding and non-coding functional domains, motifs, and sites, in combination with novel analysis methods to interrogate different aspects of the functional readout of transcript variants and isoform regulation. tappAS software and documentation are available at https://app.tappas.org.
Genome-wide association studies (GWAS) have identified multiple, shared allelic associations with many autoimmune diseases. However, the pathogenic contributions of variants residing in risk loci remain unresolved. The location of the majority of shared disease-associated variants in noncoding regions suggests they contribute to risk of autoimmunity through effects on gene expression in the immune system. In the current study, we test this hypothesis by applying RNA sequencing to CD4, CD8, and CD19 lymphocyte populations isolated from 81 subjects with type 1 diabetes (T1D). We characterize and compare the expression patterns across these cell types for three gene sets: all genes, the set of genes implicated in autoimmune disease risk by GWAS, and the subset of these genes specifically implicated in T1D. We performed RNA sequencing and aligned the reads to both the human reference genome and a catalog of all possible splicing events developed from the genome, thereby providing a comprehensive evaluation of the roles of gene expression and alternative splicing (AS) in autoimmunity. Autoimmune candidate genes displayed greater expression specificity in the three lymphocyte populations relative to other genes, with significantly increased levels of splicing events, particularly those predicted to have substantial effects on protein isoform structure and function (e.g., intron retention, exon skipping). The majority of single-nucleotide polymorphisms within T1D-associated loci were also associated with one or more -expression quantitative trait loci (-eQTLs) and/or splicing eQTLs. Our findings highlight a substantial, and previously underrecognized, role for AS in the pathogenesis of autoimmune disorders and particularly for T1D.
Parkinson’s disease (PD) is a complex neurodegenerative disorder influenced by a combination of genetic and environmental factors. The molecular mechanisms that underlie PD are unknown; however, oxidative stress and impairment of antioxidant defence mechanisms have been implicated as major contributors to disease pathogenesis. Previously, we have reported a PD patient-derived cellular model generated from biopsies of the olfactory mucosa, termed hONS cells, in which the NRF2-mediated antioxidant response pathway genes were among the most differentially-expressed. To date, few studies have examined the role of the NRF2 encoding gene, NFE2L2, and PD. In this study, we comprehensibly assessed whether rare and common NFE2L2 genetic variations modify susceptibility to PD using a large Australian case-control sample (PD=1338, controls=1379). We employed a haplotype-tagging approach that identified an association with the tagging SNP rs2364725 and PD (OR = 0.849 (0.760-0.948), P = 0.004). Further genetic screening in hONS cell lines produced no obvious pathogenic variants in the coding regions of NFE2L2. Finally, we investigated the relationship between xenobiotic exposures and NRF2 function, through gene-environment interactions, between NFE2L2 SNPs and smoking or pesticide exposure. Our results demonstrated a significant interaction between rs2706110 and pesticide exposure (OR = 0.597 (0.393-0.900), P = 0.014). In addition, we were able to identify some age-at-onset modifying SNPs and replicate an ‘early-onset’ haplotype that contains a previously identified ‘functional promoter’ SNP (rs6721961). Our results suggest a role of NFE2L2 genetic variants in modifying PD susceptibility and onset. Our findings also support the utility of testing gene-environment interactions in genetic studies of PD.
In omics experiments, variable selection involves a large number of metabolites/ genes and a small number of samples (the n < p problem). The ultimate goal is often the identification of one, or a few features that are different among conditions- a biomarker. Complicating biomarker identification, the p variables often contain a correlation structure due to the biology of the experiment making identifying causal compounds from correlated compounds difficult. Additionally, there may be elements in the experimental design (blocks, batches) that introduce structure in the data. While this problem has been discussed in the literature and various strategies proposed, the over fitting problems concomitant with such approaches are rarely acknowledged. Instead of viewing a single omics experiment as a definitive test for a biomarker, an unrealistic analytical goal, we propose to view such studies as screening studies where the goal of the study is to reduce the number of features present in the second round of testing, and to limit the Type II error. Using this perspective, the performance of LASSO, ridge regression and Elastic Net was compared with the performance of an ANOVA via a simulation study and two real data comparisons. Interestingly, a dramatic increase in the number of features had no effect on Type I error for the ANOVA approach. ANOVA, even without multiple test correction, has a low false positive rates in the scenarios tested. The Elastic Net has an inflated Type I error (from 10 to 50%) for small numbers of features which increases with sample size. The Type II error rate for the ANOVA is comparable or lower than that for the Elastic Net leading us to conclude that an ANOVA is an effective analytical tool for the initial screening of features in omics experiments.
Traditionally, the functional analysis of gene expression data has used pathway and network enrichment algorithms. These methods are usually gene rather than transcript centric and hence fall short to unravel functional roles associated to posttranscriptional regulatory mechanisms such as Alternative Splicing (AS) and Alternative PolyAdenylation (APA), jointly referred here as Alternative Transcript Processing (AltTP). Moreover, short-read RNA-seq has serious limitations to resolve full-length transcripts, further complicating the study of isoform expression. Recent advances in long-read sequencing open exciting opportunities for studying isoform biology and function. However, there are no established bioinformatics methods for the functional analysis of isoform-resolved transcriptomics data to fully leverage these technological advances. Here we present a novel framework for Functional Iso-Transcriptomics analysis (FIT). This framework uses a rich isoform-level annotation database of functional domains, motifs and sites -both coding and noncoding-and introduces novel analysis methods to interrogate different aspects of the functional relevance of isoform complexity. The Functional Diversity Analysis (FDA) evaluates the variability at the inclusion/exclusion of functional domains across annotated transcripts of the same gene. Parameters can be set to evaluate if AltTP partially or fully disrupts functional elements. FDA is a measure of the potential of a multiple isoform transcriptome to have a functional impact. By combining these functional labels with expression data, the Differential Analysis Module evaluates the relative contribution of transcriptional (i.e. gene level) and post-transcriptional (i.e. transcript/protein levels) regulation on the biology of the system. Measures of inclusion of NLS, transmembrane domains or DNA binding motifs, for example.Some of these findings were experimentally validated by others and us.In summary, we propose a novel framework for the functional analysis of transcriptomes at isoform resolution. We anticipate the tappAS tool will be an important resource for the adoption of the Functional Iso-Transcriptomics analysis by functional genomics community.
Genes involved in familial dystonia syndromes (DYT genes) are ideal candidates for investigating whether common genetic variants influence the susceptibility to sporadic primary dystonia. To date, there have been few candidate gene studies for primary dystonia and only two DYT genes, TOR1A and THAP1, have been assessed. We therefore employed a haplotype-tagging strategy to comprehensively assess if common polymorphisms in eight DYT genes (TOR1A, TAF1, GCH1, THAP1, MR-1 (PNKD), SGCE, ATP1A3 and PRKRA) confer risk for sporadic primary dystonia. The 230 primary dystonia cases were matched for age and gender to 228 controls, recruited from movement disorder clinics in Brisbane, Australia and the Australian electoral roll. All subjects were genotyped for 56 tagging SNPs and genotype associations were investigated. Modest genotypic associations (P<0.05) were observed for three GCH1 SNPs (rs12147422, rs3759664 and rs10483639) when comparing all cases against controls. Associations were also seen when the cases were stratified based on presentation. Overall, our findings do not support the hypothesis that common TOR1A variants affect susceptibility for sporadic primary dystonia, and that it is unlikely that common variants around the DYT genes confer substantial risk for sporadic primary dystonia. Further work is warranted to follow up the GCH1 SNPs and the subgroup analyses.
Alternative splicing leverages genomic content by allowing the synthesis of multiple transcripts and, by implication, protein isoforms, from a single gene. However, estimating the abundance of transcripts produced in a given tissue from short sequencing reads is difficult and can result in both the construction of transcripts that do not exist, and the failure to identify true transcripts. An alternative approach is to catalog the events that make up isoforms (splice junctions and exons). We present here the Event Analysis (EA) approach, where we project transcripts onto the genome and identify overlapping/unique regions and junctions. In addition, all possible logical junctions are assembled into a catalog. Transcripts are filtered before quantitation based on simple measures: the proportion of the events detected, and the coverage. We find that mapping to a junction catalog is more efficient at detecting novel junctions than mapping in a splice aware manner. We identify 99.8% of true transcripts while iReckon identifies 82% of the true transcripts and creates more transcripts not included in the simulation than were initially used in the simulation. Using PacBio Iso-seq data from a mouse neural progenitor cell model, EA detects 60% of the novel junctions that are combinations of existing exons while only 43% are detected by STAR. EA further detects ∼5,000 annotated junctions missed by STAR. Filtering transcripts based on the proportion of the transcript detected and the number of reads on average supporting that transcript captures 95% of the PacBio transcriptome. Filtering the reference transcriptome before quantitation, results in is a more stable estimate of isoform abundance, with improved correlation between replicates. This was particularly evident when EA is applied to an RNA-seq study of type 1 diabetes (T1D), where the coefficient of variation among subjects (n = 81) in the transcript abundance estimates was substantially reduced compared to the estimation using the full reference. EA focuses on individual transcriptional events. These events can be quantitate and analyzed directly or used to identify the probable set of expressed transcripts. Simple rules based on detected events and coverage used in filtering result in a dramatic improvement in isoform estimation without the use of ancillary data (e.g., ChIP, long reads) that may not be available for many studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.