ArrayExpress is a public database for high throughput functional genomics data. ArrayExpress consists of two parts—the ArrayExpress Repository, which is a MIAME supportive public archive of microarray data, and the ArrayExpress Data Warehouse, which is a database of gene expression profiles selected from the repository and consistently re-annotated. Archived experiments can be queried by experiment attributes, such as keywords, species, array platform, authors, journals or accession numbers. Gene expression profiles can be queried by gene names and properties, such as Gene Ontology terms and gene expression profiles can be visualized. ArrayExpress is a rapidly growing database, currently it contains data from >50 000 hybridizations and >1 500 000 individual expression profiles. ArrayExpress supports community standards, including MIAME, MAGE-ML and more recently the proposal for a spreadsheet based data exchange format: MAGE-TAB. Availability: .
Although there is only one human genome sequence, different genes are expressed in many different cell types and tissues, as well as in different developmental stages or diseases. The structure of this 'expression space' is still largely unknown, as most transcriptomics experiments focus on sampling small regions. We have constructed a global gene expression map by integrating microarray data from 5,372 human samples representing 369 different cell and tissue types, disease states and cell lines. These have been compiled in an online resource (http://www.ebi.ac.uk/gxa/array/U133A) that allows the user to search for a gene of interest and find the conditions in which it is over-or underexpressed, or, conversely, to find which genes are over-or underexpressed in a particular condition. An analysis of the structure of the expression space reveals that it can be described by a small number of distinct expression profile classes and that the first three principal components of this space have biological interpretations. The hematopoietic system, solid tissues and incompletely differentiated cell types are arranged on the first principal axis; cell lines, neoplastic samples and nonneoplastic primary tissue-derived samples are on the second principal axis; and nervous system is separated from the rest of the samples on the third axis. We also show below that most cell lines cluster together rather than with their tissues of origin.The widely used GNF Gene Expression Atlas 1,2 includes a variety of normal tissue and cell types as well as certain disease states. Many more different biological states, such as rare diseases or particular cell subtypes, exist. It is impractical for a single dedicated experiment to generate a comprehensive expression data set covering all biological conditions, partly owing to cost, but also because some conditions are studied only in specialized laboratories. Even so, we can use computational approaches to integrate the wealth of experiments that already have been performed.Integration of independent microarray studies is challenging, as microarrays do not measure gene expression in any absolute units. Several studies have integrated single-platform 3 and cross platform 4-6 data from single-channel oligonucleotide arrays yielding consistent results. It has been generally accepted, however, that only data from the same platform can be reliably integrated on a quantitative level 7 . Integration is also challenging because of the unavoidable complexity of sample descriptions. The Unified Medical Language System has been used to re-annotate free text-based sample descriptions 8 ; however, extracting information from brazma@ebi.ac.uk .
Motivation: Describing biological sample variables with ontologies is complex due to the cross-domain nature of experiments. Ontologies provide annotation solutions; however, for cross-domain investigations, multiple ontologies are needed to represent the data. These are subject to rapid change, are often not interoperable and present complexities that are a barrier to biological resource users.Results: We present the Experimental Factor Ontology, designed to meet cross-domain, application focused use cases for gene expression data. We describe our methodology and open source tools used to create the ontology. These include tools for creating ontology mappings, ontology views, detecting ontology changes and using ontologies in interfaces to enhance querying. The application of reference ontologies to data is a key problem, and this work presents guidelines on how community ontologies can be presented in an application ontology in a data-driven way.Availability: http://www.ebi.ac.uk/efoContact: malone@ebi.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.
ArrayExpress http://www.ebi.ac.uk/arrayexpress consists of three components: the ArrayExpress Repository—a public archive of functional genomics experiments and supporting data, the ArrayExpress Warehouse—a database of gene expression profiles and other bio-measurements and the ArrayExpress Atlas—a new summary database and meta-analytical tool of ranked gene expression across multiple experiments and different biological conditions. The Repository contains data from over 6000 experiments comprising approximately 200 000 assays, and the database doubles in size every 15 months. The majority of the data are array based, but other data types are included, most recently—ultra high-throughput sequencing transcriptomics and epigenetic data. The Warehouse and Atlas allow users to query for differentially expressed genes by gene names and properties, experimental conditions and sample properties, or a combination of both. In this update, we describe the ArrayExpress developments over the last two years.
The Gene Expression Atlas (http://www.ebi.ac.uk/gxa) is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions. The content of this database derives from curation, re-annotation and statistical analysis of selected data from the ArrayExpress Archive of Functional Genomics Data. A simple interface allows the user to query for differential gene expression either (i) by gene names or attributes such as Gene Ontology terms, or (ii) by biological conditions, e.g. diseases, organism parts or cell types. The gene queries return the conditions where expression has been reported, while condition queries return which genes are reported to be expressed in these conditions. A combination of both query types is possible. The query results are ranked using various statistical measures and by how many independent studies in the database show the particular gene-condition association. Currently, the database contains information about more than 200 000 genes from nine species and almost 4500 biological conditions studied in over 30 000 assays from over 1000 independent studies.
ArrayExpress is a public repository for microarray data that supports the MIAME (Minimum Informa-tion About a Microarray Experiment) requirements and stores well-annotated raw and normalized data. As of November 2004, ArrayExpress contains data from ∼12 000 hybridizations covering 35 species. Data can be submitted online or directly from local databases or LIMS in a standard format, and password-protected access to prepublication data is provided for reviewers and authors. The data can be retrieved by accession number or queried by vari-ous parameters such as species, author and array platform. A facility to query experiments by gene and sample properties is provided for a growing subset of curated data that is loaded in to the ArrayExpress data warehouse. Data can be visualized and analysed using Expression Profiler, the integrated data analysis tool. ArrayExpress is available at http://www.ebi.ac.uk/arrayexpress.
Gene Expression Atlas (http://www.ebi.ac.uk/gxa) is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions. The content of this database derives from curation, re-annotation and statistical analysis of selected data from the ArrayExpress Archive and the European Nucleotide Archive. A simple interface allows the user to query for differential gene expression either by gene names or attributes or by biological conditions, e.g. diseases, organism parts or cell types. Since our previous report we made 20 monthly releases and, as of Release 11.08 (August 2011), the database supports 19 species, which contains expression data measured for 19 014 biological conditions in 136 551 assays from 5598 independent studies.
Many fungi restructured their proteomes through incorporation of serine (Ser) at thousands of protein sites coded by the leucine (Leu) CUG codon. How these fungi survived this potentially lethal genetic code alteration and its relevance for their biology are not understood. Interestingly, the human pathogen Candida albicans maintains variable Ser and Leu incorporation levels at CUG sites, suggesting that this atypical codon assignment flexibility provided an effective mechanism to alter the genetic code. To test this hypothesis, we have engineered C. albicans strains to misincorporate increasing levels of Leu at protein CUG sites. Tolerance to the misincorporations was very high, and one strain accommodated the complete reversion of CUG identity from Ser back to Leu. Increasing levels of Leu misincorporation decreased growth rate, but production of phenotypic diversity on a phenotypic array probing various metabolic networks, drug resistance, and host immune cell responses was impressive. Genome resequencing revealed an increasing number of genotype changes at polymorphic sites compared with the control strain, and 80% of Leu misincorporation resulted in complete loss of heterozygosity in a large region of chromosome V. The data unveil unanticipated links between gene translational fidelity, proteome instability and variability, genome diversification, and adaptive phenotypic diversity. They also explain the high heterozygosity of the C. albicans genome and open the door to produce microorganisms with genetic code alterations for basic and applied research.codon reassignment | evolution | tRNA N atural alterations to the standard genetic code have been discovered in Mycoplasma (1, 2), Micrococci (3), ciliates (4), fungi (5, 6), and mitochondria (7), modifying the hypothesis of a universal genetic code (8). Both neutral (9) and nonneutral theories (10) have been proposed to explain codon reassignments; however, experimental data to support or refute them are scarce, and genetic code alterations remain an intriguing biological puzzle. Despite this fact, it is becoming clear that genetic code alterations are associated with mutations in tRNAs and translation release factors that expand or restrict codon decoding capacity (7). In other words, alterations of translational factors have the potential to release the genetic code from its frozen state. This hypothesis is strongly supported by the widespread cotranslational incorporation of selenocysteine into the active site of selenoprotein (11) and pyrrolysine in the active site of the methyltransferases of several Metanosarcina species (12), Desulfitobacterium hafniense (13), and the gutless worm Olavius algarvensis (14). The selective advantages produced by these two amino acids are associated with evolution of proteins with unique catalytic properties.The flexibility of the genetic code is further highlighted by the in vivo incorporation of artificial amino acids into recombinant proteins of Escherichia coli, yeast, and mammalian cells using orthogonal pairs of tRNA...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.