Ensembl (https://www.ensembl.org) has produced high-quality genomic resources for vertebrates and model organisms for more than twenty years. During that time, our resources, services and tools have continually evolved in line with both the publicly available genome data and the downstream research and applications that utilise the Ensembl platform. In recent years we have witnessed a dramatic shift in the genomic landscape. There has been a large increase in the number of high-quality reference genomes through global biodiversity initiatives. In parallel, there have been major advances towards pangenome representations of higher species, where many alternative genome assemblies representing different breeds, cultivars, strains and haplotypes are now available. In order to support these efforts and accelerate downstream research, it is our goal at Ensembl to create high-quality annotations, tools and services for species across the tree of life. Here, we report our resources for popular reference genomes, the dramatic growth of our annotations (including haplotypes from the first human pangenome graphs), updates to the Ensembl Variant Effect Predictor (VEP), interactive protein structure predictions from AlphaFold DB, and the beta release of our new website.
Motivation The Severe Acute Respiratory Syndrome-Coronavirus 2 (SARS-CoV-2) has recently emerged as the responsible for the pandemic outbreak of the coronavirus disease (COVID-19). This virus is closely related to coronaviruses infecting bats and Malayan pangolins, species suspected to be an intermediate host in the passage to humans. Several genomic mutations affecting viral proteins have been identified, contributing to the understanding of the recent animal-to-human transmission. However, the capacity of SARS-CoV-2 to encode functional putative microRNAs (miRNAs) remains largely unexplored. Results We have used deep learning to discover 12 candidate stem-loop structures hidden in the viral protein-coding genome. Among the precursors, the expression of eight mature miRNAs-like sequences was confirmed in small RNA-seq data from SARS-CoV-2 infected human cells. Predicted miRNAs are likely to target a subset of human genes of which 109 are transcriptionally deregulated upon infection. Remarkably, 28 of those genes potentially targeted by SARS-CoV-2 miRNAs are down-regulated in infected human cells. Interestingly, most of them have been related to respiratory diseases and viral infection, including several afflictions previously associated with SARS-CoV-1 and SARS-CoV-2. The comparison of SARS-CoV-2 pre-miRNA sequences with those from bat and pangolin coronaviruses suggests that single nucleotide mutations could have helped its progenitors jumping inter-species boundaries, allowing the gain of novel mature miRNAs targeting human mRNAs. Our results suggest that the recent acquisition of novel miRNAs-like sequences in the SARS-CoV-2 genome may have contributed to modulate the transcriptional reprogramming of the new host upon infection.
Sunflower germplasm collections are valuable resources for broadening the genetic base of commercial hybrids and ameliorate the risk of climate events. Nowadays, the most studied worldwide sunflower pre-breeding collections belong to INTA (Argentina), INRA (France), and USDA-UBC (United States of America–Canada). In this work, we assess the amount and distribution of genetic diversity (GD) available within and between these collections to estimate the distribution pattern of global diversity. A mixed genotyping strategy was implemented, by combining proprietary genotyping-by-sequencing data with public whole-genome-sequencing data, to generate an integrative 11,834-common single nucleotide polymorphism matrix including the three breeding collections. In general, the GD estimates obtained were moderate. An analysis of molecular variance provided evidence of population structure between breeding collections. However, the optimal number of subpopulations, studied via discriminant analysis of principal components (K = 12), the bayesian STRUCTURE algorithm (K = 6) and distance-based methods (K = 9) remains unclear, since no single unifying characteristic is apparent for any of the inferred groups. Different overall patterns of linkage disequilibrium (LD) were observed across chromosomes, with Chr10, Chr17, Chr5, and Chr2 showing the highest LD. This work represents the largest and most comprehensive inter-breeding collection analysis of genomic diversity for cultivated sunflower conducted to date.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.