The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.
The Reactome Knowledgebase (https://reactome.org), an Elixir core resource, provides manually curated molecular details across a broad range of physiological and pathological biological processes in humans, including both hereditary and acquired disease processes. The processes are annotated as an ordered network of molecular transformations in a single consistent data model. Reactome thus functions both as a digital archive of manually curated human biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. Recent curation work has expanded our annotations of normal and disease-associated signaling processes and of the drugs that target them, in particular infections caused by the SARS-CoV-1 and SARS-CoV-2 coronaviruses and the host response to infection. New tools support better simultaneous analysis of high-throughput data from multiple sources and the placement of understudied (‘dark’) proteins from analyzed datasets in the context of Reactome’s manually curated pathways.
BackgroundReactome aims to provide bioinformatics tools for visualisation, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modelling, systems biology and education. Pathway analysis methods have a broad range of applications in physiological and biomedical research; one of the main problems, from the analysis methods performance point of view, is the constantly increasing size of the data samples.ResultsHere, we present a new high-performance in-memory implementation of the well-established over-representation analysis method. To achieve the target, the over-representation analysis method is divided in four different steps and, for each of them, specific data structures are used to improve performance and minimise the memory footprint. The first step, finding out whether an identifier in the user’s sample corresponds to an entity in Reactome, is addressed using a radix tree as a lookup table. The second step, modelling the proteins, chemicals, their orthologous in other species and their composition in complexes and sets, is addressed with a graph. The third and fourth steps, that aggregate the results and calculate the statistics, are solved with a double-linked tree.ConclusionThrough the use of highly optimised, in-memory data structures and algorithms, Reactome has achieved a stable, high performance pathway analysis service, enabling the analysis of genome-wide datasets within seconds, allowing interactive exploration and analysis of high throughput data. The proposed pathway analysis approach is available in the Reactome production web site either via the AnalysisService for programmatic access or the user submission interface integrated into the PathwayBrowser. Reactome is an open data and open source project and all of its source code, including the one described here, is available in the AnalysisTools repository in the Reactome GitHub (https://github.com/reactome/).
SH2 domain proteins are important components of the signal transduction pathways activated by growth factor receptor tyrosine kinases. We have been cloning SH2 domain proteins by bacterial expression cloning using the tyrosine phosphorylated C‐terminus of the epidermal growth factor receptor as a probe. One of these newly cloned SH2 domain proteins, GRB‐7, was mapped on mouse chromosome 11 to a region which also contains the tyrosine kinase receptor, HER2/erbB‐2. The analogous chromosomal locus in man is often amplified in human breast cancer leading to overexpression of HER2. We find that GRB‐7 is amplified in concert with HER2 in several breast cancer cell lines and that GRB‐7 is overexpressed in both cell lines and breast tumors. GRB‐7, through its SH2 domain, binds tightly to HER2 such that a large fraction of the tyrosine phosphorylated HER2 in SKBR‐3 cells is bound to GRB‐7. GRB‐7 can also bind tyrosine phosphorylated SHC, albeit at a lower affinity than GRB2 binds SHC. We also find that GRB‐7 has a strong similarity over > 300 amino acids to a newly identified gene in Caenorhabditis elegans. This region of similarity, which lies outside the SH2 domain, also contains a pleckstrin homology domain. The presence of evolutionarily conserved domains indicates that GRB‐7 is likely to perform a basic signaling function. The fact that GRB‐7 and HER2 are both overexpressed and bound tightly together suggests that this basic signaling pathway is greatly amplified in certain breast cancers.
A mixed-oligonucleotide probe was used to identify four ras-like coding sequences in a human teratocarcinoma cDNA library. Two of these sequences resembled the rho genes, one was closely related to H-, K-, and N-ras, and one shared only the four sequence domains that define the ras gene superfamily. Homologs of the four genes were found in genomic DNA from a variety of mammals and from chicken. The genes were transcriptionally active in a range of human cell types.Mammalian ras genes (8,11,36) encode a family of proteins that show low but significant homology to the Go, subunits of G proteins (18). A number of genes encoding proteins with Mrs of 20,000 to 25,000 that share significant homology with the ras proteins have been isolated. The homology is greatest in four domains that have been shown through both mutagenic (see reference 2 for a review) and X-ray crystallographic (9, 17, 28) studies to be involved in the binding and hydrolysis of guanine nucleotides. Many ras-related proteins also contain a fifth conserved domain at their carboxy termini that, in H-, K-, and N-ras, is required for membrane localization and biological activity (13, 40). These ras-related proteins are found in a variety of eucaryotic organisms and appear to be well conserved over evolutionary time.The ras gene superfamily can be divided into several major groups on the basis of amino acid sequence: (i) the H-ras, K-ras, and N-ras proto-oncogenes (H, K, and N genes); (ii) the ral genes, which share about 50% homology with H-, K-, and N-ras (4, 5); (iii) the rap genes (29, 30) and R-ras (21), which differ significantly from each other but all share about 50 to 55% homology with the ras proteins, including strict conservation of the ras effector domain (amino acids 32 to 40 of H-ras); (iv) the rho genes, a more distantly related group that exhibits only about 35% identity with the ras proteins (23, 24); and (v)
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.
Previous studies in the laboratory indicated that glycosylphosphatidylinositol (GPI)-anchored proteins may generate diversity of the cell surface of different neuronal populations (Rosen et al., 1992). In this study, we have extended these findings and surveyed the expression of GPI-anchored proteins in the developing rat CNS. In addition to several well characterized GPI-anchored cell adhesion molecules (CAMs), we detected an unidentified broad band of 65 kDa that is the earliest and most abundantly expressed GPI-anchored species in the rat CNS. Purification of this protein band revealed that it is comprised of several related proteins that define a novel subfamily of immunoglobulin-like (Ig) CAMs. One of these proteins is the opiate binding-cell adhesion molecule (OBCAM). We have isolated a cDNA encoding a second member of this family, that we have termed neurotrimin, and present evidence for the existence of additional family members. Like OBCAM, with which it shares extensive sequence identity, neurotrimin contains three immunoglobulin-like domains. Both proteins are encoded by distinct genes that may be clustered on the proximal end of mouse chromosome 9. Characterization of the expression of neurotrimin and OBCAM in the developing CNS by in situ hybridization reveals that these proteins are differentially expressed during development. Neurotrimin is expressed at high levels in several developing projection systems: in neurons of the thalamus, subplate, and lower cortical laminae in the forebrain and in the pontine nucleus, cerebellar granule cells, and Purkinje cells in the hindbrain. Neurotrimin is also expressed at high levels in the olfactory bulb, neural retina, dorsal root ganglia, spinal cord, and in a graded distribution in the basal ganglia and hippocampus. OBCAM has a much more restricted distribution, being expressed at high levels principally in the cortical plate and hippocampus. These results suggest that these proteins, together with other members of this family, provide diversity to the surfaces of different neuronal populations that could be important in the specification of neuronal connectivity.
Genome B Bi io ol lo og gy y 2009, 1 10 0: :402 Correction C Co or rr re ec ct ti io on n: : R Re ea ac ct to om me e: : a a k kn no ow wl le ed dg ge e b ba as se e o of f b bi io ol lo og gi ic c p pa at th hw wa ay ys s a an nd d p pr ro oc ce es ss se es s The electronic version of this article is the complete one and can be found online at
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.