The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.
The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate subphylum and key model organisms. The Ensembl annotation pipeline is capable of integrating experimental and reference data from multiple providers into a single integrated resource. Here, we present 94 newly annotated and re-annotated genomes, bringing the total number of genomes offered by Ensembl to 227. This represents the single largest expansion of the resource since its inception. We also detail our continued efforts to improve human annotation, developments in our epigenome analysis and display, a new tool for imputing causal genes from genome-wide association studies and visualisation of variation within a 3D protein model. Finally, we present information on our new website. Both software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license) and data updates made available four times a year.
The Ensembl project (https://www.ensembl.org) annotates genomes and disseminates genomic data for vertebrate species. We create detailed and comprehensive annotation of gene structures, regulatory elements and variants, and enable comparative genomics by inferring the evolutionary history of genes and genomes. Our integrated genomic data are made available in a variety of ways, including genome browsers, search interfaces, specialist tools such as the Ensembl Variant Effect Predictor, download files and programmatic interfaces. Here, we present recent Ensembl developments including two new website portals. Ensembl Rapid Release (http://rapid.ensembl.org) is designed to provide core tools and services for genomes as soon as possible and has been deployed to support large biodiversity sequencing projects. Our SARS-CoV-2 genome browser (https://covid-19.ensembl.org) integrates our own annotation with publicly available genomic data from numerous sources to facilitate the use of genomics in the international scientific response to the COVID-19 pandemic. We also report on other updates to our annotation resources, tools and services. All Ensembl data and software are freely available without restriction.
The Ensembl project (https://www.ensembl.org) makes key genomic data sets available to the entire scientific community without restrictions. Ensembl seeks to be a fundamental resource driving scientific progress by creating, maintaining and updating reference genome annotation and comparative genomics resources. This year we describe our new and expanded gene, variant and comparative annotation capabilities, which led to a 50% increase in the number of vertebrate genomes we support. We have also doubled the number of available human variants and added regulatory regions for many mouse cell types and developmental stages. Our data sets and tools are available via the Ensembl website as well as a through a RESTful webservice, Perl application programming interface and as data files for download.
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.
SummaryBackgroundSince June, 2012, Middle East respiratory syndrome coronavirus (MERS-CoV) has, worldwide, caused 104 infections in people including 49 deaths, with 82 cases and 41 deaths reported from Saudi Arabia. In addition to confirming diagnosis, we generated the MERS-CoV genomic sequences obtained directly from patient samples to provide important information on MERS-CoV transmission, evolution, and origin.MethodsFull genome deep sequencing was done on nucleic acid extracted directly from PCR-confirmed clinical samples. Viral genomes were obtained from 21 MERS cases of which 13 had 100%, four 85–95%, and four 30–50% genome coverage. Phylogenetic analysis of the 21 sequences, combined with nine published MERS-CoV genomes, was done.FindingsThree distinct MERS-CoV genotypes were identified in Riyadh. Phylogeographic analyses suggest the MERS-CoV zoonotic reservoir is geographically disperse. Selection analysis of the MERS-CoV genomes reveals the expected accumulation of genetic diversity including changes in the S protein. The genetic diversity in the Al-Hasa cluster suggests that the hospital outbreak might have had more than one virus introduction.InterpretationWe present the largest number of MERS-CoV genomes (21) described so far. MERS-CoV full genome sequences provide greater detail in tracking transmission. Multiple introductions of MERS-CoV are identified and suggest lower R0 values. Transmission within Saudi Arabia is consistent with either movement of an animal reservoir, animal products, or movement of infected people. Further definition of the exposures responsible for the sporadic introductions of MERS-CoV into human populations is urgently needed.FundingSaudi Arabian Ministry of Health, Wellcome Trust, European Community, and National Institute of Health Research University College London Hospitals Biomedical Research Centre.
Motivation: An accurate genome assembly from short read sequencing data is critical for downstream analysis, for example allowing investigation of variants within a sequenced population. However, assembling sequencing data from virus samples, especially RNA viruses, into a genome sequence is challenging due to the combination of viral population diversity and extremely uneven read depth caused by amplification bias in the inevitable reverse transcription and polymerase chain reaction amplification process of current methods.Results: We developed a new de novo assembler called IVA (Iterative Virus Assembler) designed specifically for read pairs sequenced at highly variable depth from RNA virus samples. We tested IVA on datasets from 140 sequenced samples from human immunodeficiency virus-1 or influenza-virus-infected people and demonstrated that IVA outperforms all other virus de novo assemblers.Availability and implementation: The software runs under Linux, has the GPLv3 licence and is freely available from http://sanger-pathogens.github.io/ivaContact: iva@sanger.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.
BackgroundXenotropic murine leukaemia viruses (MLV-X) are endogenous gammaretroviruses that infect cells from many species, including humans. Xenotropic murine leukaemia virus-related virus (XMRV) is a retrovirus that has been the subject of intense debate since its detection in samples from humans with prostate cancer (PC) and chronic fatigue syndrome (CFS). Controversy has arisen from the failure of some studies to detect XMRV in PC or CFS patients and from inconsistent detection of XMRV in healthy controls.ResultsHere we demonstrate that Taqman PCR primers previously described as XMRV-specific can amplify common murine endogenous viral sequences from mouse suggesting that mouse DNA can contaminate patient samples and confound specific XMRV detection. To consider the provenance of XMRV we sequenced XMRV from the cell line 22Rv1, which is infected with an MLV-X that is indistinguishable from patient derived XMRV. Bayesian phylogenies clearly show that XMRV sequences reportedly derived from unlinked patients form a monophyletic clade with interspersed 22Rv1 clones (posterior probability >0.99). The cell line-derived sequences are ancestral to the patient-derived sequences (posterior probability >0.99). Furthermore, pol sequences apparently amplified from PC patient material (VP29 and VP184) are recombinants of XMRV and Moloney MLV (MoMLV) a virus with an envelope that lacks tropism for human cells. Considering the diversity of XMRV we show that the mean pairwise genetic distance among env and pol 22Rv1-derived sequences exceeds that of patient-associated sequences (Wilcoxon rank sum test: p = 0.005 and p < 0.001 for pol and env, respectively). Thus XMRV sequences acquire diversity in a cell line but not in patient samples. These observations are difficult to reconcile with the hypothesis that published XMRV sequences are related by a process of infectious transmission.ConclusionsWe provide several independent lines of evidence that XMRV detected by sensitive PCR methods in patient samples is the likely result of PCR contamination with mouse DNA and that the described clones of XMRV arose from the tumour cell line 22Rv1, which was probably infected with XMRV during xenografting in mice. We propose that XMRV might not be a genuine human pathogen.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.