High-throughput data production technologies, particularly ‘next-generation’ DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible, transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale.
Transposable element activity is repressed in the germline in animals by PIWI-interacting RNAs (piRNAs), a class of small RNAs produced by genomic loci mostly composed of TE sequences. The mechanism of induction of piRNA production by these loci is still enigmatic. We have shown that, in Drosophila melanogaster, a cluster of tandemly repeated P-lacZ-white transgenes can be activated for piRNA production by maternal inheritance of a cytoplasm containing homologous piRNAs. This activated state is stably transmitted over generations and allows trans-silencing of a homologous transgenic target in the female germline. Such an epigenetic conversion displays the functional characteristics of a paramutation, i.e., a heritable epigenetic modification of one allele by the other. We report here that piRNA production and trans-silencing capacities of the paramutated cluster depend on the function of the rhino, cutoff, and zucchini genes involved in primary piRNA biogenesis in the germline, as well as on that of the aubergine gene implicated in the ping-pong piRNA amplification step. The 21-nt RNAs, which are produced by the paramutated cluster, in addition to 23-to 28-nt piRNAs are not necessary for paramutation to occur. Production of these 21-nt RNAs requires Dicer-2 but also all the piRNA genes tested. Moreover, cytoplasmic transmission of piRNAs homologous to only a subregion of the transgenic locus can generate a strong paramutated locus that produces piRNAs along the whole length of the transgenes. Finally, we observed that maternally inherited transgenic small RNAs can also impact transgene expression in the soma. In conclusion, paramutation involves both nuclear (Rhino, Cutoff) and cytoplasmic (Aubergine, Zucchini) actors of the piRNA pathway. In addition, since it is observed between nonfully homologous loci located on different chromosomes, paramutation may play a crucial role in epigenome shaping in Drosophila natural populations.KEYWORDS gene regulation; trans-generational epigenetics; noncoding small RNAs; mobile DNA; Drosophila G ENOMES must confront the presence of a large fraction of mobile DNA whose activity can result in severe deleterious effects on chromosome stability and gametogenesis. In the germline of animals, a system of genomic traps exists into which any transposable element (TE) can insert, thereby generating loci that contain a catalog of potentially dangerous sequences that have to be repressed (Brennecke et al. 2007;Pane et al. 2011;Iwasaki et al. 2015). In the Drosophila melanogaster germline, most of these loci are transcribed in both directions (dual-strand clusters) and undergo noncanonical transcription and RNA processing (Mohn et al. 2014;Zhang et al. 2014). This results in production of noncoding small RNAs having the capacity to target the transcripts of the homologous, potentially active, TE copies scattered throughout the genome. These small RNAs are called PIWI-interacting RNAs (piRNAs) and repress TE activity at both the transcriptional and post-transcriptional levels (Sa...
Transposable elements (TEs) play a significant role in evolution, contributing to genetic variation. However, TE mobilization in somatic cells is not well understood. Here, we address the prevalence of transposition in a somatic tissue, exploiting the Drosophila midgut as a model. Using whole-genome sequencing of in vivo clonally expanded gut tissue, we have mapped hundreds of highconfidence somatic TE integration sites genome-wide. We show that somatic retrotransposon insertions are associated with inactivation of the tumor suppressor Notch, likely contributing to neoplasia formation. Moreover, applying Oxford Nanopore longread sequencing technology we provide evidence for tissue-specific differences in retrotransposition. Comparing somatic TE insertional activity with transcriptomic and small RNA sequencing data, we demonstrate that transposon mobility cannot be simply predicted by whole tissue TE expression levels or by small RNA pathway activity. Finally, we reveal that somatic TE insertions in the adult fly intestine are enriched in genic regions and in transcriptionally active chromatin. Together, our findings provide clear evidence of ongoing somatic transposition in Drosophila and delineate previously unknown features underlying somatic TE mobility in vivo.
The RNA interference (RNAi) pathway plays an important role in antiviral immunity in insects. To -counteract the RNAi-mediated immune response of their hosts, several insect viruses, such as Flock house virus, Drosophila C virus, and Cricket paralysis virus, encode potent viral suppressors of RNAi (VSRs). Because of the importance of RNAi in antiviral defense in insects, other insect viruses are likely to encode VSRs as well. In this chapter, we describe a detailed protocol for an RNAi reporter assay in Drosophila S2 cells for the identification of VSR activity.
Background Long-read sequencing can be applied to generate very long contigs and even completely assembled genomes at relatively low cost and with minimal sample preparation. As a result, long-read sequencing platforms are becoming more popular. In this respect, the Oxford Nanopore Technologies–based long-read sequencing “nanopore" platform is becoming a widely used tool with a broad range of applications and end-users. However, the need to explore and manipulate the complex data generated by long-read sequencing platforms necessitates accompanying specialized bioinformatics platforms and tools to process the long-read data correctly. Importantly, such tools should additionally help democratize bioinformatics analysis by enabling easy access and ease-of-use solutions for researchers. Results The Galaxy platform provides a user-friendly interface to computational command line–based tools, handles the software dependencies, and provides refined workflows. The users do not have to possess programming experience or extended computer skills. The interface enables researchers to perform powerful bioinformatics analysis, including the assembly and analysis of short- or long-read sequence data. The newly developed “NanoGalaxy" is a Galaxy-based toolkit for analysing long-read sequencing data, which is suitable for diverse applications, including de novo genome assembly from genomic, metagenomic, and plasmid sequence reads. Conclusions A range of best-practice tools and workflows for long-read sequence genome assembly has been integrated into a NanoGalaxy platform to facilitate easy access and use of bioinformatics tools for researchers. NanoGalaxy is freely available at the European Galaxy server https://nanopore.usegalaxy.eu with supporting self-learning training material available at https://training.galaxyproject.org.
To the Editor -The COVID-19 pandemic is the first health crisis characterized by large amounts of genomic data 1 . Computational infrastructure can be a bottleneck for data analysis, amplifying global inequalities in ability to track SARS-CoV-2 evolution. This is an issue even in developed countries, as computational infrastructure requires expertise in resource procurement, configuration and maintenance. Commercial computational clouds do not fully address the problem because these resources must still be configured and funded. Furthermore, commercial clouds are predominantly US-based and many countries have policies making payments to foreign providers impractical. In developing countries, research computing infrastructure is rare and researchers often cannot afford commercial cloud-based computation. Here, we present the COVID-19 effort by the Galaxy Project, which pools free worldwide public computational infrastructure, making the analysis of deep sequencing data accessible to anyone while also providing an analytical framework for global pathogen genomic surveillance based on raw sequencing-read data.Despite the existence of well designed and validated SARS-CoV-2 data analysis approaches 2,3 , the ad hoc 4 nature of their application often complicates the integration and comparison of analysis results. Public computational infrastructure (XSEDE, ELIXIR and Nectar Cloud in the United States, European Union and Australia, respectively) coupled with existing open-source software offers a solution to SARS-CoV-2 analytics challenges. However, glue is required to bind these resources into a unified platform for managing users, allocating storage and pairing analysis tools with appropriate computational resources. Such a platform is best not developed by a single principal investigator, group or institution, but rather supported by an international community of users, developers and educators.We have developed a two-stage platform (Fig. 1) housed on three public Galaxy instances 5 in the United States (http:// usegalaxy.org), the European Union (http:// usegalaxy.eu) and Australia (http://usegalaxy. org.au) and capable of supporting hundreds of thousands of complex analyses per month. Anyone can run effectively unlimited
SummaryMaking reproducible, auditable and scalable data-processing analysis workflows is an important challenge in the field of bioinformatics. Recently, software containers and cloud computing introduced a novel solution to address these challenges. They simplify software installation, management and reproducibility by packaging tools and their dependencies. In this work we implemented a cloud provider agnostic and scalable container orchestration setup for the popular Galaxy workflow environment. This solution enables Galaxy to run on and offload jobs to most cloud providers (e.g. Amazon Web Services, Google Cloud or OpenStack, among others) through the Kubernetes container orchestrator.AvailabilityAll code has been contributed to the Galaxy Project and is available (since Galaxy 17.05) at https://github.com/galaxyproject/ in the galaxy and galaxy-kubernetes repositories. https://public.phenomenal-h2020.eu/ is an example deployment.Suppl. InformationSupplementary Files are available online.Contactpmoreno@ebi.ac.uk, European Molecular Biology Laboratory, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK, Tel: +44-1223-494267, Fax: +44-1223-484696.
The current state of much of the Wuhan pneumonia virus (severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]) research shows a regrettable lack of data sharing and considerable analytical obfuscation. This impedes global research cooperation, which is essential for tackling public health emergencies and requires unimpeded access to data, analysis tools, and computational infrastructure. Here, we show that community efforts in developing open analytical software tools over the past 10 years, combined with national investments into scientific computational infrastructure, can overcome these deficiencies and provide an accessible platform for tackling global health emergencies in an open and transparent manner. Specifically, we use all SARS-CoV-2 genomic data available in the public domain so far to (1) underscore the importance of access to raw data and (2) demonstrate that existing community efforts in curation and deployment of biomedical software can reliably support rapid, reproducible research during global health crises. All our analyses are fully documented at https://github.com/galaxyproject/SARS-CoV-2. The initial publications describing genomic features of SARS-CoV-2 [1-4] used Illumina and Oxford nanopore data to elucidate the sequence composition of patient specimens (although only Wu and colleagues [3] explicitly provided the accession numbers for their raw short-read sequencing data). However, their approaches to processing, assembly, and analysis of raw data differed widely (Table 1) and ranged from transparent [3] to entirely opaque [4]. Such lack of analytical transparency sets a dangerous precedent. Infectious disease outbreaks often occur in
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.