The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this publication we describe enhancements made to our data processing pipeline and to our website to adapt to an ever-increasing information content. The number of sequences in UniProtKB has risen to over 227 million and we are working towards including a reference proteome for each taxonomic group. We continue to extract detailed annotations from the literature to update or create reviewed entries, while unreviewed entries are supplemented with annotations provided by automated systems using a variety of machine-learning techniques. In addition, the scientific community continues their contributions of publications and annotations to UniProt entries of their interest. Finally, we describe our new website (https://www.uniprot.org/), designed to enhance our users’ experience and make our data easily accessible to the research community. This interface includes access to AlphaFold structures for more than 85% of all entries as well as improved visualisations for subcellular localisation of proteins.
Purpose: Current methods of classification of astrocytoma based on histopathologic methods are often subjective and less accurate. Although patients with glioblastoma have grave prognosis, significant variability in patient outcome is observed. Therefore, the aim of this study was to identify glioblastoma diagnostic and prognostic markers through microarray analysis. Experimental Design: We carried out transcriptome analysis of 25 diffusely infiltrating astrocytoma samples [WHO grade IIödiffuse astrocytoma, grade IIIöanaplastic astrocytoma, and grade IVöglioblastoma (GBM)] using cDNA microarrays containing 18,981 genes. Several of the markers identified were also validated by real-time reverse transcription quantitative PCR and immunohistochemical analysis on an independent set of tumor samples (n = 100). Survival analysis was carried out for two markers on another independent set of retrospective cases (n = 51). Results: We identified several differentially regulated grade-specific genes. Independent validation by real-time reverse transcription quantitative PCR analysis found growth arrest and DNA-damage^inducible a (GADD45a) and follistatin-like 1 (FSTL1) to be up-regulated in most GBMs (both primary and secondary), whereas superoxide dismutase 2 and adipocyte enhancer binding protein 1 were up-regulated in the majority of primary GBM. Further, identification of the grade-specific expression of GADD45a and FSTL1 by immunohistochemical staining reinforced our findings. Analysis of retrospective GBM cases with known survival data revealed that cytoplasmic overexpression of GADD45a conferred better survival while the coexpression of FSTL1with p53 was associated with poor survival. Conclusions: Our study reveals that GADD45a and FSTLI are GBM-specific whereas superoxide dismutase 2 and adipocyte enhancer binding protein 1are primary GBM-specific diagnostic markers. Whereas GADD45a overexpression confers a favorable prognosis, FSTL1 overexpression is a hallmark of poor prognosis in GBM patients.
Activator protein 2␣ (AP-2␣) 3 is a sequence-specific DNA binding transcription factor that is required for normal growth and morphogenesis (1-3). AP-2␣ has a conserved C-terminal DNA binding motif with an integral helix-span-helix homodimerization motif and a less-conserved proline and aromatic amino acid-rich helix transactivation domain near the N terminus and binds to a consensus DNA sequence, 5Ј-GCCNNNGGC-3Ј (4 -8). AP-2␣ has been shown to regulate many genes involved in variety of biological functions (9).Several lines of evidences indicate that AP-2␣ may act as a tumor suppressor gene. AP-2␣ gene is located in chromosome position 6p22, a region of frequent loss of heterozygosity in breast and other cancers (10). Diminished AP-2␣ function has been correlated with N-Ras oncogene-mediated transformation (11). The functions of AP-2␣ have been shown to be regulated by SV40 T antigen and adenovirus E1A oncoproteins (1, 12). In addition, reduced or loss of AP-2␣ expression has been reported in human cancers of breast, ovary, colon, skin, brain, and prostate (13)(14)(15)(16)(17)(18)(19)(20). In good correlation, expression of dominant negative mutant AP-2␣ resulted in increased invasiveness and tumorigenicity (21).Overexpression of AP-2␣ by transient transfection into cultured cells has been shown to induce p21 WAF1/CIP1 and inhibit cellular DNA synthesis and colony formation (22). Significant correlation between AP-2␣ expression and p21 WAF1/CIP1 has been observed in breast cancer, colorectal cancer, and malignant melanoma (15,18,23). The growth inhibitory activity of AP-2␣ has also been shown to be mediated through direct interaction with p53 (24). Results from our laboratory suggest that adenovirus-mediated overexpression of AP-2␣ inhibits growth of cancer cells by inhibiting cellular DNA synthesis and inducing apoptosis (25). Furthermore, our recent work establishes that AP-2␣ is induced in cancer cells upon treatment with chemotherapeutic drugs, which contributes to chemosensitivity because the simultaneous inhibition of AP-2␣ by siRNA increases the chemoresistance. In addition, the re-expression of epigenetically silenced AP-2␣ in breast cancer resulted in enhanced chemosensitivity and loss of tumorigenicity upon chemotherapy in an AP-2␣-dependent manner (26). These results point out the importance of apoptosis induction by AP-2␣ for its functions, particularly the role in chemosensitivity.The molecular mechanism by which AP-2␣ induces apoptosis is not known. In the present study we have analyzed the pathways and various molecules involved in AP-2␣-induced apoptosis. We found that AP-2␣-induced apoptosis requires primarily the mitochondrial pathway involving a bax/cytochrome c/Apaf1/caspase 9-dependent mechanism. We also found that AP-2␣ binds to the Bcl-2 promoter leading to its transcriptional down-regulation and is essential for the apoptosis induction by AP-2␣. In addition, we provide evidence that the overexpressed AP-2␣ (perhaps functionally inactive) in certain breast cancer cells can be made functio...
Advances in high-throughput sequencing have led to an unprecedented growth in genome sequences being submitted to biological databases. In particular, the sequencing of large numbers of nearly identical bacterial genomes during infection outbreaks and for other large-scale studies has resulted in a high level of redundancy in nucleotide databases and consequently in the UniProt Knowledgebase (UniProtKB). Redundancy negatively impacts on database searches by causing slower searches, an increase in statistical bias and cumbersome result analysis. The redundancy combined with the large data volume increases the computational costs for most reuses of UniProtKB data. All of this poses challenges for effective discovery in this wealth of data. With the continuing development of sequencing technologies, it is clear that finding ways to minimize redundancy is crucial to maintaining UniProt's essential contribution to data interpretation by our users. We have developed a methodology to identify and remove highly redundant proteomes from UniProtKB. The procedure identifies redundant proteomes by performing pairwise alignments of sets of sequences for pairs of proteomes and subsequently, applies graph theory to find dominating sets that provide a set of non-redundant proteomes with a minimal loss of information. This method was implemented for bacteria in mid-2015, resulting in a removal of 50 million proteins in UniProtKB. With every new release, this procedure is used to filter new incoming proteomes, resulting in a more scalable and scientifically valuable growth of UniProtKB.Database URL: http://www.uniprot.org/proteomes/
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.