The human genome is thought to harbor 50,000 to 100,000 genes, of which about half have been sampled to date in the form of expressed sequence tags. An international consortium was organized to develop and map gene-based sequence tagged site markers on a set of two radiation hybrid panels and a yeast artificial chromosome library. More than 16,000 human genes have been mapped relative to a framework map that contains about 1000 polymorphic genetic markers. The gene map unifies the existing genetic and physical maps with the nucleotide and protein sequence databases in a fashion that should speed the discovery of genes underlying inherited human disease. The integrated resource is available through a site on the World Wide Web at http://www.ncbi.nlm.nih.gov/SCIENCE96/.
A map of 30,181 human gene-based markers was assembled and integrated with the current genetic map by radiation hybrid mapping. The new gene map contains nearly twice as many genes as the previous release, includes most genes that encode proteins of known function, and is twofold to threefold more accurate than the previous version. A redesigned, more informative and functional World Wide Web site (www.ncbi.nlm.nih.gov/genemap) provides the mapping information and associated data and annotations. This resource constitutes an important infrastructure and tool for the study of complex genetic traits, the positional cloning of disease genes, the cross-referencing of mammalian genomes, and validated human transcribed sequences for large-scale studies of gene expression.
BackgroundHuman endogenous retroviruses (HERVs) represent the inheritance of ancient germ-line cell infections by exogenous retroviruses and the subsequent transmission of the integrated proviruses to the descendants. ERVs have the same internal structure as exogenous retroviruses. While no replication-competent HERVs have been recognized, some retain up to three of four intact ORFs. HERVs have been classified before, with varying scope and depth, notably in the RepBase/RepeatMasker system. However, existing classifications are bewildering. There is a need for a systematic, unifying and simple classification. We strived for a classification which is traceable to previous classifications and which encompasses HERV variation within a limited number of clades.ResultsThe human genome assembly GRCh 37/hg19 was analyzed with RetroTector, which primarily detects relatively complete Class I and II proviruses. A total of 3173 HERV sequences were identified. The structure of and relations between these proviruses was resolved through a multi-step classification procedure that involved a novel type of similarity image analysis (“Simage”) which allowed discrimination of heterogeneous (noncanonical) from homogeneous (canonical) HERVs. Of the 3173 HERVs, 1214 were canonical and segregated into 39 canonical clades (groups), belonging to class I (Gamma- and Epsilon-like), II (Beta-like) and III (Spuma-like). The groups were chosen based on (1) sequence (nucleotide and Pol amino acid), similarity, (2) degree of fit to previously published clades, often from RepBase, and (3) taxonomic markers. The groups fell into 11 supergroups. The 1959 noncanonical HERVs contained 31 additional, less well-defined groups. Simage analysis revealed several types of mosaicism, notably recombination and secondary integration. By comparing flanking sequences, LTRs and completeness of gene structure, we deduced that some noncanonical HERVs proliferated after the recombination event. Groups were further divided into envelope subgroups (altogether 94) based on sequence similarity and characteristic “immunosuppressive domain” motifs. Intra and inter(super)group, as well as intraclass, recombination involving envelope genes (“env snatching”) was a common event. LTR divergence indicated that HERV-K(HML2) and HERVFC had the most recent integrations, HERVL and HUERSP3 the oldest.ConclusionsA comprehensive HERV classification and characterization approach was undertaken. It should be applicable for classification of all ERVs. Recombination was common among HERV ancestors.Electronic supplementary materialThe online version of this article (doi:10.1186/s12977-015-0232-y) contains supplementary material, which is available to authorized users.
Since 1989, about 570 different p53 mutations have been identified in more than 8000 human cancers. A database of these mutations was initiated by M. Hollstein and C. C. Harris in 1990. This database originally consisted of a list of somatic point mutations in the p 53 gene of human tumors and cell lines, compiled from the published literature and made available in a standard electronic form. The database is maintained at the International Agency for Research on Cancer (IARC) and updated versions are released twice a year (January and July). The current version (July 1997) contains records on 6800 published mutations and will surpass the 8000 mark in the January 1998 release. The database now contains information on somatic and germline mutations in a new format to facilitate data retrieval. In addition, new tools are constructed to improve data analysis, such as a Mutation Viewer Java applet developed at the European Bioinformatics Institute (EBI) to visualise the location and impact of mutations on p53 protein structure. The database is available in different electronic formats at IARC (http://www.iarc. fr/p53/homepage.htm ) or from the EBI server (http://www.ebi.ac.uk ). The IARC p53 website also provides reports on database analysis and links with other p53 sites as well as with related databases. In this report, we describe the criteria for inclusion of data, the revised format and the new visualisation tools. We also briefly discuss the relevance of p 53 mutations to clinical and biological questions.
The tumor suppressor p53 gene is the most frequently mutated gene in human cancer. To date, more than 10,000 mutations have been described in the literature, and these data are available in various electronic formats on the World Wide Web. Here we describe the structure and format of the different p53 datasets maintained and curated at the International Agency for Research on Cancer (IARC) in Lyon, France. These include p53 somatic mutations (more than 10,000 entries), p53 germline mutations (144 entries), and p53 polymorphisms (13 entries), with the somatic mutations organized into a relational database using AccessTM. The main features of these datasets are (1) controlled entry with standardized format and restricted vocabulary, (2) inclusion of annotations on individual characteristics and exposures, and (3) a classification of pathologies based on the International Classification of Diseases for Oncology (ICD-O). In addition, several interfaces have been developed to analyze the data in order to produce mutation spectra, codon analyses, or visualization of the mutation with the tertiary structure of the protein. All datasets and tools for analysis are available at http://www.iarc.fr/p53/homepage.
Human blood plasma is a useful source of proteins associated with both health and disease. Analysis of human blood plasma is a challenge due to the large number of peptides and proteins present and the very wide range of concentrations. In order to identify as many proteins as possible for subsequent comparative studies, we developed an industrial-scale (2.5 liter) approach involving sample pooling for the analysis of smaller proteins (M(r) generally < ca. 40 000 and some fragments of very large proteins). Plasma from healthy males was depleted of abundant proteins (albumin and IgG), then smaller proteins and polypeptides were separated into 12 960 fractions by chromatographic techniques. Analysis of proteins and polypeptides was performed by mass spectrometry prior to and after enzymatic digestion. Thousands of peptide identifications were made, permitting the identification of 502 different proteins and polypeptides from a single pool, 405 of which are listed here. The numbers refer to chromatographically separable polypeptide entities present prior to digestion. Combining results from studies with other plasma pools we have identified over 700 different proteins and polypeptides in plasma. Relatively low abundance proteins such as leptin and ghrelin and peptides such as bradykinin, all invisible to two-dimensional gel technology, were clearly identified. Proteins of interest were synthesized by chemical methods for bioassays. We believe that this is the first time that the small proteins in human blood plasma have been separated and analyzed so extensively.
We present an integrated proteomics platform designed for performing differential analyses. Since reproducible results are essential for comparative studies, we explain how we improved reproducibility at every step of our laboratory processes, e.g. by taking advantage of the powerful laboratory information management system we developed. The differential capacity of our platform is validated by detecting known markers in a real sample and by a spiking experiment. We introduce an innovative two-dimensional (2-D) plot for displaying identification results combined with chromatographic data. This 2-D plot is very convenient for detecting differential proteins. We also adapt standard multivariate statistical techniques to show that peptide identification scores can be used for reliable and sensitive differential studies. The interest of the protein separation approach we generally apply is justified by numerous statistics, complemented by a comparison with a simple shotgun analysis performed on a small volume sample. By introducing an automatic integration step after mass spectrometry data identification, we are able to search numerous databases systematically, including the human genome and expressed sequence tags. Finally, we explain how rigorous data processing can be combined with the work of human experts to set high quality standards, and hence obtain reliable (false positive < 0.35%) and nonredundant protein identifications.
We have conducted a detailed structural analysis of 90 kilobases (kb) of the HLA Class III region from the Bat2 gene at the centromeric end to 23 kb beyond TNF. A single contig of 80 kb was sequenced entirely with a group of four smaller contigs covering 10 kb being only partly sequenced. This region contains four known genes and a novel telomeric potential coding region. The genes are bracketed by long, dense clusters of Alu repeats belonging to all the major families. At least six new families of MER repeats and one pseudogene are intercalated within and between the Alu clusters. The most telomeric 3.8 kb contains three potential exons, one of which bears strong homology to the ankyrin domain of the DNA binding factors NF kappa B and I kappa B.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.