Enhancers are distal cis-regulatory elements that activate the transcription of their target genes. They regulate a wide range of important biological functions and processes, including embryogenesis, development, and homeostasis. As more and more large-scale technologies were developed for enhancer identification, a comprehensive database is highly desirable for enhancer annotation based on various genome-wide profiling datasets across different species. Here, we present an updated database EnhancerAtlas 2.0 (http://www.enhanceratlas.org/indexv2.php), covering 586 tissue/cell types that include a large number of normal tissues, cancer cell lines, and cells at different development stages across nine species. Overall, the database contains 13 494 603 enhancers, which were obtained from 16 055 datasets using 12 high-throughput experiment methods (e.g. H3K4me1/H3K27ac, DNase-seq/ATAC-seq, P300, POLR2A, CAGE, ChIA-PET, GRO-seq, STARR-seq and MPRA). The updated version is a huge expansion of the first version, which only contains the enhancers in human cells. In addition, we predicted enhancer–target gene relationships in human, mouse and fly. Finally, the users can search enhancers and enhancer–target gene relationships through five user-friendly, interactive modules. We believe the new annotation of enhancers in EnhancerAtlas 2.0 will facilitate users to perform useful functional analysis of enhancers in various genomes.
Motivation: Multiple high-throughput approaches have recently been developed and allowed the discovery of enhancers on a genome scale in a single experiment. However, the datasets generated from these approaches are not fully utilized by the research community due to technical challenges such as lack of consensus enhancer annotation and integrative analytic tools. Results: We developed an interactive database, EnhancerAtlas, which contains an atlas of 2,534,123 enhancers for 105 cell/tissue types. A consensus enhancer annotation was obtained for each cell by summation of independent experimental datasets with the relative weights derived from a cross-validation approach. Moreover, EnhancerAtlas provides a set of useful analytic tools that allow users to query and compare enhancers in a particular genomic region or associated with a gene of interest, and assign enhancers and their target genes from a custom dataset. Availability and Implementation: The database with analytic tools is available at http://www.enhan ceratlas.org/.
We reported an integrated database of Compendium of Protein Lysine Modifications (CPLM; http://cplm.biocuckoo.org) for protein lysine modifications (PLMs), which occur at active ε-amino groups of specific lysine residues in proteins and are critical for orchestrating various biological processes. The CPLM database was updated from our previously developed database of Compendium of Protein Lysine Acetylation (CPLA), which contained 7151 lysine acetylation sites in 3311 proteins. Here, we manually collected experimentally identified substrates and sites for 12 types of PLMs, including acetylation, ubiquitination, sumoylation, methylation, butyrylation, crotonylation, glycation, malonylation, phosphoglycerylation, propionylation, succinylation and pupylation. In total, the CPLM database contained 203 972 modification events on 189 919 modified lysines in 45 748 proteins for 122 species. With the dataset, we totally identified 76 types of co-occurrences of various PLMs on the same lysine residues, and the most abundant PLM crosstalk is between acetylation and ubiquitination. Up to 53.5% of acetylation and 33.1% of ubiquitination events co-occur at 10 746 lysine sites. Thus, the various PLM crosstalks suggested that a considerable proportion of lysines were competitively and dynamically regulated in a complicated manner. Taken together, the CPLM database can serve as a useful resource for further research of PLMs.
In this work, we developed a family-based database of UUCD (http://uucd.biocuckoo.org) for ubiquitin and ubiquitin-like conjugation, which is one of the most important post-translational modifications responsible for regulating a variety of cellular processes, through a similar E1 (ubiquitin-activating enzyme)–E2 (ubiquitin-conjugating enzyme)–E3 (ubiquitin-protein ligase) enzyme thioester cascade. Although extensive experimental efforts have been taken, an integrative data resource is still not available. From the scientific literature, 26 E1s, 105 E2s, 1003 E3s and 148 deubiquitination enzymes (DUBs) were collected and classified into 1, 3, 19 and 7 families, respectively. To computationally characterize potential enzymes in eukaryotes, we constructed 1, 1, 15 and 6 hidden Markov model (HMM) profiles for E1s, E2s, E3s and DUBs at the family level, separately. Moreover, the ortholog searches were conducted for E3 and DUB families without HMM profiles. Then the UUCD database was developed with 738 E1s, 2937 E2s, 46 631 E3s and 6647 DUBs of 70 eukaryotic species. The detailed annotations and classifications were also provided. The online service of UUCD was implemented in PHP + MySQL + JavaScript + Perl.
We present here EKPD (http://ekpd.biocuckoo.org), a hierarchical database of eukaryotic protein kinases (PKs) and protein phosphatases (PPs), the key molecules responsible for the reversible phosphorylation of proteins that are involved in almost all aspects of biological processes. As extensive experimental and computational efforts have been carried out to identify PKs and PPs, an integrative resource with detailed classification and annotation information would be of great value for both experimentalists and computational biologists. In this work, we first collected 1855 PKs and 347 PPs from the scientific literature and various public databases. Based on previously established rationales, we classified all of the known PKs and PPs into a hierarchical structure with three levels, i.e. group, family and individual PK/PP. There are 10 groups with 149 families for the PKs and 10 groups with 33 families for the PPs. We constructed 139 and 27 Hidden Markov Model profiles for PK and PP families, respectively. Then we systematically characterized ∼50 000 PKs and >10 000 PPs in eukaryotes. In addition, >500 PKs and >400 PPs were computationally identified by ortholog search. Finally, the online service of the EKPD database was implemented in PHP + MySQL + JavaScript.
Recent studies have indicated that different post-translational modifications (PTMs) synergistically orchestrate specific biological processes by crosstalks. However, the preference of the crosstalk among different PTMs and the evolutionary constraint on the PTM crosstalk need further dissections. In this study, the in situ crosstalk at the same positions among three tyrosine PTMs including sulfation, nitration and phosphorylation were systematically analyzed. The experimentally identified sulfation, nitration and phosphorylation sites were collected and integrated with reliable predictions to perform large-scale analyses of in situ crosstalks. From the results, we observed that the in situ crosstalk between sulfation and nitration is significantly under-represented, whereas both sulfation and nitration prefer to co-occupy with phosphorylation at same tyrosines. Further analyses suggested that sulfation and nitration preferentially co-occur with phosphorylation at specific positions in proteins, and participate in distinct biological processes and functions. More interestingly, the long-term evolutionary analysis indicated that multi-PTM targeting tyrosines didn't show any higher conservation than singly modified ones. Also, the analysis of human genetic variations demonstrated that there is no additional functional constraint on inherited disease, cancer or rare mutations of multiply modified tyrosines. Taken together, our systematic analyses provided a better understanding of the in situ crosstalk among PTMs.
As a severe chronic metabolic disease and autoimmune disorder, type 1 diabetes (T1D) affects millions of people world-wide. Recent advances in antigen-based immunotherapy have provided a great opportunity for further treating T1D with a high degree of selectivity. It is reported that MHC class II I-Ag7 in the non-obese diabetic (NOD) mouse and human HLA-DQ8 are strongly linked to susceptibility to T1D. Thus, the identification of new I-Ag7 and HLA-DQ8 epitopes would be of great help to further experimental and biomedical manipulation efforts. In this study, a novel GPS-MBA (MHC Binding Analyzer) software package was developed for the prediction of I-Ag7 and HLA-DQ8 epitopes. Using experimentally identified epitopes as the training data sets, a previously developed GPS (Group-based Prediction System) algorithm was adopted and improved. By extensive evaluation and comparison, the GPS-MBA performance was found to be much better than other tools of this type. With this powerful tool, we predicted a number of potentially new I-Ag7 and HLA-DQ8 epitopes. Furthermore, we designed a T1D epitope database (TEDB) for all of the experimentally identified and predicted T1D-associated epitopes. Taken together, this computational prediction result and analysis provides a starting point for further experimental considerations, and GPS-MBA is demonstrated to be a useful tool for generating starting information for experimentalists. The GPS-MBA is freely accessible for academic researchers at: http://mba.biocuckoo.org.
Long-range regulation by distal enhancers is crucial for many biological processes. The existing methods for enhancer-target gene prediction often require many genomic features. This makes them difficult to be applied to many cell types, in which the relevant datasets are not always available. Here, we design a tool EAGLE, an enhancer and gene learning ensemble method for identification of Enhancer-Gene (EG) interactions. Unlike existing tools, EAGLE used only six features derived from the genomic features of enhancers and gene expression datasets. Cross-validation revealed that EAGLE outperformed other existing methods. Enrichment analyses on special transcriptional factors, epigenetic modifications, and eQTLs demonstrated that EAGLE could distinguish the interacting pairs from non- interacting ones. Finally, EAGLE was applied to mouse and human genomes and identified 7,680,203 and 7,437,255 EG interactions involving 31,375 and 43,724 genes, 138,547 and 177,062 enhancers across 89 and 110 tissue/cell types in mouse and human, respectively. The obtained interactions are accessible through an interactive database enhanceratlas.org. The EAGLE method is available at https://github.com/EvansGao/EAGLE and the predicted datasets are available in http://www.enhanceratlas.org/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.