The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.
The double cubic lattice method (DCLM) is an accurate and rapid approach for computing numerically molecular surface areas (such as the solvent accessible or van der Waals surface) and the volume and compactness of molecular assemblies and for generating dot surfaces. The algorithm has no special memory requirements and can be easily implemented. The computation speed is extremely high, making interactive calculation of surfaces, volumes, and dot surfaces for systems of 1000 and more atoms possible on single-processor workstations. The algorithm can be easily parallelized. The DCLM is an algorithmic variant of the approach proposed by Shrake and Rupley (J. Mol. Biol., 79,351-371,1973). However, the application of two cubic lattices-one for grouping neighboring atomic centers and the other for grouping neighboring surface dots of an atom-results in a drastic reduction of central processing unit (CPU) time consumption by avoiding redundant distance checks. This is most *Author to whom all correspondence should be addressed at European Molecular Biology Laboratory, Postfach 10.2209, Meyerhofstrasse 1, D-69012 Heidelberg, Germany. Frank Eisenhaber is a visiting scientist at the EMBL, Heidelberg. noticeable for compact conformations. For instance, the calculation of the solvent accessible surface area of the crystal conformation of bovine pancreatic trypsin inhibitor (entry 4PTI of the Brookhaven Protein Data Bank, 362-point sphere for all 454 nonhydrogen atoms) takes less than 1 second (on a single R3000 processor of an SGI 4D/480, about 5 MFLOP). The DCLM does not depend on the spherical point distribution applied. The quality of unit sphere tesselations is discussed. We propose new ways of subdivision based on the icosahedron and dodecahedron, which achieve constantly low ratios of longest to shortest arcs over the whole frequency range. The DCLM is the method of choice, especially for large molecular complexes and high point densities. Its speed has been compared to the fastest techniques known to the authors, and it was found to be superior, especially when also taking into account the small memory requirement and the flexibility of the algorithm. The program text may be obtained on request. 0
To understand regulatory systems, it would be useful to uniformly determine how different components contribute to the expression of all other genes. We therefore monitored mRNA expression genome-wide, for individual deletions of one-quarter of yeast genes, focusing on (putative) regulators. The resulting genetic perturbation signatures reflect many different properties. These include the architecture of protein complexes and pathways, identification of expression changes compatible with viability, and the varying responsiveness to genetic perturbation. The data are assembled into a genetic perturbation network that shows different connectivities for different classes of regulators. Four feed-forward loop (FFL) types are overrepresented, including incoherent type 2 FFLs that likely represent feedback. Systematic transcription factor classification shows a surprisingly high abundance of gene-specific repressors, suggesting that yeast chromatin is not as generally restrictive to transcription as is often assumed. The data set is useful for studying individual genes and for discovering properties of an entire regulatory system.
A map of 30,181 human gene-based markers was assembled and integrated with the current genetic map by radiation hybrid mapping. The new gene map contains nearly twice as many genes as the previous release, includes most genes that encode proteins of known function, and is twofold to threefold more accurate than the previous version. A redesigned, more informative and functional World Wide Web site (www.ncbi.nlm.nih.gov/genemap) provides the mapping information and associated data and annotations. This resource constitutes an important infrastructure and tool for the study of complex genetic traits, the positional cloning of disease genes, the cross-referencing of mammalian genomes, and validated human transcribed sequences for large-scale studies of gene expression.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.