Alternative splicing is a tightly regulated biological process by which the number of gene products for any given gene can be greatly expanded. Genomic variants in splicing regulatory sequences can disrupt splicing and cause disease. Recent developments in sequencing technologies and computational biology have allowed researchers to investigate alternative splicing at an unprecedented scale and resolution. Population-scale transcriptome studies have revealed many naturally occurring genetic variants that modulate alternative splicing and consequently influence phenotypic variability and disease susceptibility in human populations. Innovations in experimental and computational tools such as massively parallel reporter assays and deep learning have enabled the rapid screening of genomic variants for their causal impacts on splicing. In this review, we describe technological advances that have greatly increased the speed and scale at which discoveries are made about the genetic variation of alternative splicing. We summarize major findings from population transcriptomic studies of alternative splicing and discuss the implications of these findings for human genetics and medicine.
We reported an integrated database of Compendium of Protein Lysine Modifications (CPLM; http://cplm.biocuckoo.org) for protein lysine modifications (PLMs), which occur at active ε-amino groups of specific lysine residues in proteins and are critical for orchestrating various biological processes. The CPLM database was updated from our previously developed database of Compendium of Protein Lysine Acetylation (CPLA), which contained 7151 lysine acetylation sites in 3311 proteins. Here, we manually collected experimentally identified substrates and sites for 12 types of PLMs, including acetylation, ubiquitination, sumoylation, methylation, butyrylation, crotonylation, glycation, malonylation, phosphoglycerylation, propionylation, succinylation and pupylation. In total, the CPLM database contained 203 972 modification events on 189 919 modified lysines in 45 748 proteins for 122 species. With the dataset, we totally identified 76 types of co-occurrences of various PLMs on the same lysine residues, and the most abundant PLM crosstalk is between acetylation and ubiquitination. Up to 53.5% of acetylation and 33.1% of ubiquitination events co-occur at 10 746 lysine sites. Thus, the various PLM crosstalks suggested that a considerable proportion of lysines were competitively and dynamically regulated in a complicated manner. Taken together, the CPLM database can serve as a useful resource for further research of PLMs.
New strategies for the construction of versatile nanovehicles to overcome the multiple challenges of targeted delivery are urgently needed for cancer therapy. To address these needs, we developed a novel targeting-clickable and tumor-cleavable polyurethane nanomicelle for multifunctional delivery of antitumor drugs. The polyurethane was synthesized from biodegradable poly(ε-caprolactone) (PCL) and L-lysine ethyl ester diisocyanate (LDI), further extended by a new designed L-cystine-derivatized chain extender bearing a redox-responsive disulfide bond and clickable alkynyl groups (Cys-PA), and finally terminated by a detachable methoxyl-poly(ethylene glycol) with a highly pH-sensitive benzoic-imine linkage (BPEG). The obtained polymers show attractive self-assembly characteristics and stimuli-responsiveness, good cytocompatibility, and high loading capacity for doxorubicin (DOX). Furthermore, folic acid (FA) as a model targeting ligand was conjugated to the polyurethane micelles via an efficient click reaction. The decoration of FA results in an enhanced cellular uptake and improved drug efficacy toward FA-receptor positive HeLa cancer cells in vitro. As a proof-of-concept, this work provides a facile approach to the design of extracellularly activatable nanocarriers for tumor-targeted and programmed intracellular drug delivery.
A major limitation for RNA-seq analysis of alternative splicing is its reliance on high sequencing coverage. We report DARTS ( https://github.com/Xinglab/DARTS ), a computational framework that integrates deep learning-based predictions with empirical RNA-seq evidence to infer differential alternative splicing between biological samples. DARTS leverages public RNA-seq big data to provide a knowledge base of splicing regulation via deep learning, helping researchers better characterize alternative splicing using RNA-seq datasets even with modest coverage.
Supplementary data are available at Bioinformatics online.
We present here EKPD (http://ekpd.biocuckoo.org), a hierarchical database of eukaryotic protein kinases (PKs) and protein phosphatases (PPs), the key molecules responsible for the reversible phosphorylation of proteins that are involved in almost all aspects of biological processes. As extensive experimental and computational efforts have been carried out to identify PKs and PPs, an integrative resource with detailed classification and annotation information would be of great value for both experimentalists and computational biologists. In this work, we first collected 1855 PKs and 347 PPs from the scientific literature and various public databases. Based on previously established rationales, we classified all of the known PKs and PPs into a hierarchical structure with three levels, i.e. group, family and individual PK/PP. There are 10 groups with 149 families for the PKs and 10 groups with 33 families for the PPs. We constructed 139 and 27 Hidden Markov Model profiles for PK and PP families, respectively. Then we systematically characterized ∼50 000 PKs and >10 000 PPs in eukaryotes. In addition, >500 PKs and >400 PPs were computationally identified by ortholog search. Finally, the online service of the EKPD database was implemented in PHP + MySQL + JavaScript.
Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation.
Recent studies have indicated that different post-translational modifications (PTMs) synergistically orchestrate specific biological processes by crosstalks. However, the preference of the crosstalk among different PTMs and the evolutionary constraint on the PTM crosstalk need further dissections. In this study, the in situ crosstalk at the same positions among three tyrosine PTMs including sulfation, nitration and phosphorylation were systematically analyzed. The experimentally identified sulfation, nitration and phosphorylation sites were collected and integrated with reliable predictions to perform large-scale analyses of in situ crosstalks. From the results, we observed that the in situ crosstalk between sulfation and nitration is significantly under-represented, whereas both sulfation and nitration prefer to co-occupy with phosphorylation at same tyrosines. Further analyses suggested that sulfation and nitration preferentially co-occur with phosphorylation at specific positions in proteins, and participate in distinct biological processes and functions. More interestingly, the long-term evolutionary analysis indicated that multi-PTM targeting tyrosines didn't show any higher conservation than singly modified ones. Also, the analysis of human genetic variations demonstrated that there is no additional functional constraint on inherited disease, cancer or rare mutations of multiply modified tyrosines. Taken together, our systematic analyses provided a better understanding of the in situ crosstalk among PTMs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.