The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data.
We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.
biomaRt is a new Bioconductor package that integrates BioMart data resources with data analysis software in Bioconductor. It can annotate a wide range of gene or gene product identifiers (e.g. Entrez-Gene and Affymetrix probe identifiers) with information such as gene symbol, chromosomal coordinates, Gene Ontology and OMIM annotation. Furthermore biomaRt enables retrieval of genomic sequences and single nucleotide polymorphism information, which can be used in data analysis. Fast and up-to-date data retrieval is possible as the package executes direct SQL queries to the BioMart databases (e.g. Ensembl). The biomaRt package provides a tight integration of large, public or locally installed BioMart databases with data analysis in Bioconductor creating a powerful environment for biological data mining.
Schizophrenia is a devastating neurodevelopmental disorder whose genetic influences remain elusive. We hypothesize that individually rare structural variants contribute to the illness. Microdeletions and microduplications >100 kilobases were identified by microarray comparative genomic hybridization of genomic DNA from 150 individuals with schizophrenia and 268 ancestry-matched controls. All variants were validated by high-resolution platforms. Novel deletions and duplications of genes were present in 5% of controls versus 15% of cases and 20% of young-onset cases, both highly significant differences. The association was independently replicated in patients with childhood-onset schizophrenia as compared with their parents. Mutations in cases disrupted genes disproportionately from signaling networks controlling neurodevelopment, including neuregulin and glutamate pathways. These results suggest that multiple, individually rare mutations altering genes in neurodevelopmental pathways contribute to schizophrenia.
Mapping DNase I hypersensitive (HS) sites is an accurate method of identifying the location of genetic regulatory elements, including promoters, enhancers, silencers, insulators, and locus control regions. We employed high-throughput sequencing and whole-genome tiled array strategies to identify DNase I HS sites within human primary CD4+ T cells. Combining these two technologies, we have created a comprehensive and accurate genome-wide open chromatin map. Surprisingly, only 16%-21% of the identified 94,925 DNase I HS sites are found in promoters or first exons of known genes, but nearly half of the most open sites are in these regions. In conjunction with expression, motif, and chromatin immunoprecipitation data, we find evidence of cell-type-specific characteristics, including the ability to identify transcription start sites and locations of different chromatin marks utilized in these cells. In addition, and unexpectedly, our analyses have uncovered detailed features of nucleosome structure.
The domestic dog exhibits greater diversity in body size than any other terrestrial vertebrate. We used a strategy that exploits the breed structure of dogs to investigate the genetic basis of size. First, through a genome-wide scan, we identified a major quantitative trait locus (QTL) on chromosome 15 influencing size variation within a single breed. Second, we examined genetic variation in the 15-megabase interval surrounding the QTL in small and giant breeds and found marked evidence for a selective sweep spanning a single gene (IGF1), encoding insulin-like growth factor 1. A single IGF1 single-nucleotide polymorphism haplotype is common to all small breeds and nearly absent from giant breeds, suggesting that the same causal sequence variant is a major contributor to body size in all small dogs.
The incidence of melanoma is increasing more than any other cancer, and knowledge of its genetic alterations is limited. To systematically analyze such alterations, we performed whole-exome sequencing of 14 matched normal and metastatic tumor DNAs. Using stringent criteria, we identified 68 genes that appeared to be somatically mutated at elevated frequency, many of which are not known to be genetically altered in tumors. Most importantly, we discovered that TRRAP harbored a recurrent mutation that clustered in one position (p. Ser722Phe) in 6 out of 67 affected individuals (~4%), as well as a previously unidentified gene, GRIN2A, which was mutated in 33% of melanoma samples. The nature, pattern and functional evaluation of the TRRAP recurrent mutation suggest that TRRAP functions as an oncogene. Our study provides, to our knowledge, the most comprehensive map of genetic alterations in melanoma to date and suggests that the glutamate signaling pathway is involved in this disease.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.