Knowledge of the complete genomic DNA sequence of an organism allows a systematic approach to defining its genetic components. The genomic sequence provides access to the complete structures of all genes, including those without known function, their control elements, and, by inference, the proteins they encode, as well as all other biologically important sequences. Furthermore, the sequence is a rich and permanent source of information for the design of further biological studies of the organism and for the study of evolution through cross-species sequence comparison. The power of this approach has been amply demonstrated by the determination of the sequences of a number of microbial and model organisms. The next step is to obtain the complete sequence of the entire human genome. Here we report the sequence of the euchromatic part of human chromosome 22. The sequence obtained consists of 12 contiguous segments spanning 33.4 megabases, contains at least 545 genes and 134 pseudogenes, and provides the first view of the complex chromosomal landscapes that will be found in the rest of the genome.
Human cancers often carry many somatically acquired genomic rearrangements, some of which may be implicated in cancer development. However, conventional strategies for characterizing rearrangements are laborious and low-throughput and have low sensitivity or poor resolution. We used massively parallel sequencing to generate sequence reads from both ends of short DNA fragments derived from the genomes of two individuals with lung cancer. By investigating read pairs that did not align correctly with respect to each other on the reference human genome, we characterized 306 germline structural variants and 103 somatic rearrangements to the base-pair level of resolution. The patterns of germline and somatic rearrangement were markedly different. Many somatic rearrangements were from amplicons, although rearrangements outside these regions, notably including tandem duplications, were also observed. Some somatic rearrangements led to abnormal transcripts, including two from internal tandem duplications and two fusion transcripts created by interchromosomal rearrangements. Germline variants were predominantly mediated by retrotransposition, often involving AluY and LINE elements. The results demonstrate the feasibility of systematic, genome-wide characterization of rearrangements in complex human cancer genomes, raising the prospect of a new harvest of genes associated with cancer using this strategy.Somatic genetic changes involved in cancer causation include point mutations, genomic rearrangements and changes in copy number1. Most of the currently identified genes © 2008 Nature Publishing Group Correspondence should be addressed to M.R.S. (mrs@sanger.ac.uk) or P. A.F. (paf@sanger.ac.uk).. 4 These authors contributed equally to this work. AUTHOR CONTRIBUTIONS P.J.C. and P.J.S. equally contributed to generating and analysing sequencing, copy number, PCR and breakpoint data, and wrote the manuscript. E.D.P. coordinated the bioinformatic analyses with support for mapping from H.L. and A.C. and for pipelining from L.A.S., C.L., A.M. and J.W.T. S.O., S.E. and C.H. performed the confirmatory PCRs and Sanger sequencing. T.S. and P.A.W.E. performed FISH and SKY experiments. I.G. and M.A.Q. undertook library production from the cell lines, and C.M.C. and D.J.T. ran the massively parallel sequencing instruments. C.B., R.D. and M.E.H. contributed to the analysis and interpretation of data. G.R.B., M.R.S. and P.A.F. coordinated the research, interpreted the data and wrote the manuscript. URLs Europe PMC Funders Author ManuscriptsEurope PMC Funders Author Manuscripts associated with cancer contribute to oncogenesis as a result of somatic rearrangements that result either in fusion transcripts or in transcriptional deregulation by apposing enhancer or promoter elements to intact protein coding sequences1. The large majority of the known somatically rearranged genes associated with cancer are found in the small minority of human cancers comprising leukemias, lymphomas and soft tissue tumors (see URLs section in Methods). Fus...
Phylogenetic relationships in the genus Nicotiana were investigated using parsimony analyses of the internal transcribed spacer (ITS) regions of nuclear ribosomal DNA (nrDNA). In addition, origins of some amphidiploid taxa in Nicotiana were investigated using the techniques of genomic in situ hybridization (GISH), and the results of both sets of analyses were used to evaluate previous hypotheses about the origins of these taxa. Phylogenetic analyses of the ITS nrDNA data were performed on the entire genus (66 of 77 naturally occurring species, plus three artificial hybrids), comprising both diploid and polyploid taxa, and on the diploid taxa only (35 species) to examine the effects of amphidiploids on estimates of relationships. All taxa, regardless of ploidy, produced clean, single copies of the ITS region, even though some taxa are hybrids. Results are compared with a published plastid (matK) phylogeny using fewer, but many of the same, taxa. The patterns of relationships in Nicotiana, as seen in both analyses, are largely congruent with each other and previous evolutionary ideas based on morphology and cytology, but some important differences are apparent. None of the currently recognized subgenera of Nicotiana is monophyletic and, although most of the currently recognized sections are coherent, others are clearly polyphyletic. Relying solely upon ITS nrDNA analysis to reveal phylogenetic patterns in a complex genus such as Nicotiana is insufficient, and it is clear that conventional analysis of single data sets, such as ITS, is likely to be misleading in at least some respects about evolutionary history. ITS sequences of natural and well-documented amphidiploids are similar or identical to one of their two parents-usually, but not always, the maternal parent-and are not in any sense themselves 'hybrid'. Knowing how ITS evolves in artificial amphidiploids gives insight into what ITS analysis might reveal about naturally occurring amphidiploids of unknown origin, and it is in this perspective that analysis of ITS sequences is highly informative.
The Human Epigenome Project aims to identify, catalogue, and interpret genome-wide DNA methylation phenomena. Occurring naturally on cytosine bases at cytosine–guanine dinucleotides, DNA methylation is intimately involved in diverse biological processes and the aetiology of many diseases. Differentially methylated cytosines give rise to distinct profiles, thought to be specific for gene activity, tissue type, and disease state. The identification of such methylation variable positions will significantly improve our understanding of genome biology and our ability to diagnose disease. Here, we report the results of the pilot study for the Human Epigenome Project entailing the methylation analysis of the human major histocompatibility complex. This study involved the development of an integrated pipeline for high-throughput methylation analysis using bisulphite DNA sequencing, discovery of methylation variable positions, epigenotyping by matrix-assisted laser desorption/ionisation mass spectrometry, and development of an integrated public database available at http://www.epigenome.org. Our analysis of DNA methylation levels within the major histocompatibility complex, including regulatory exonic and intronic regions associated with 90 genes in multiple tissues and individuals, reveals a bimodal distribution of methylation profiles (i.e., the vast majority of the analysed regions were either hypo- or hypermethylated), tissue specificity, inter-individual variation, and correlation with independent gene expression data.
Gene trapping is a method of generating murine embryonic stem (ES) cell lines containing insertional mutations in known and novel genes. A number of international groups have used this approach to create sizeable public cell line repositories available to the scientific community for the generation of mutant mouse strains. The major gene trapping groups worldwide have recently joined together to centralize access to all publicly available gene trap lines by developing a user-oriented Website for the International Gene Trap Consortium (IGTC). This collaboration provides an impressive public informatics resource comprising ∼45 000 well-characterized ES cell lines which currently represent ∼40% of known mouse genes, all freely available for the creation of knockout mice on a non-collaborative basis. To standardize annotation and provide high confidence data for gene trap lines, a rigorous identification and annotation pipeline has been developed combining genomic localization and transcript alignment of gene trap sequence tags to identify trapped loci. This information is stored in a new bioinformatics database accessible through the IGTC Website interface. The IGTC Website () allows users to browse and search the database for trapped genes, BLAST sequences against gene trap sequence tags, and view trapped genes within biological pathways. In addition, IGTC data have been integrated into major genome browsers and bioinformatics sites to provide users with outside portals for viewing this data. The development of the IGTC Website marks a major advance by providing the research community with the data and tools necessary to effectively use public gene trap resources for the large-scale characterization of mammalian gene function.
The Ensembl Web site (http://www.ensembl.org/) is the principal user interface to the data of the Ensembl project, and currently serves >500,000 pages (∼2.5 million hits) per week, providing access to >80 GB (gigabyte) of data to users in more than 80 countries. Built atop an open-source platform comprising Apache/mod_perl and the MySQL relational database management system, it is modular, extensible, and freely available. It is being actively reused and extended in several different projects, and has been downloaded and installed in companies and academic institutions worldwide. Here, we describe some of the technical features of the site, with particular reference to its dynamic configuration that enables it to handle disparate data from multiple species.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.