BackgroundSequencing data has become a standard measure of diverse cellular activities. For example, gene expression is accurately measured by RNA sequencing (RNA-Seq) libraries, protein-DNA interactions are captured by chromatin immunoprecipitation sequencing (ChIP-Seq), protein-RNA interactions by crosslinking immunoprecipitation sequencing (CLIP-Seq) or RNA immunoprecipitation (RIP-Seq) sequencing, DNA accessibility by assay for transposase-accessible chromatin (ATAC-Seq), DNase or MNase sequencing libraries. The processing of these sequencing techniques involves library-specific approaches. However, in all cases, once the sequencing libraries are processed, the result is a count table specifying the estimated number of reads originating from each genomic locus. Differential analysis to determine which loci have different cellular activity under different conditions starts with the count table and iterates through a cycle of data assessment, preparation and analysis. Such complex analysis often relies on multiple programs and is therefore a challenge for those without programming skills.ResultsWe developed DEBrowser as an R bioconductor project to interactively visualize every step of the differential analysis, without programming. The application provides a rich and interactive web based graphical user interface built on R’s shiny infrastructure. DEBrowser allows users to visualize data with various types of graphs that can be explored further by selecting and re-plotting any desired subset of data. Using the visualization approaches provided, users can determine and correct technical variations such as batch effects and sequencing depth that affect differential analysis. We show DEBrowser’s ease of use by reproducing the analysis of two previously published data sets.ConclusionsDEBrowser is a flexible, intuitive, web-based analysis platform that enables an iterative and interactive analysis of count data without any requirement of programming knowledge.Electronic supplementary materialThe online version of this article (10.1186/s12864-018-5362-x) contains supplementary material, which is available to authorized users.
BackgroundSequencing data has become a standard measure for studying diverse cellular activities. For example, gene expression is accurately measured by RNA sequencing (RNA-Seq) libraries, protein-DNA interactions are captured by chromatin immunoprecipitation sequencing (ChIP-Seq), protein-RNA interactions by crosslinking immunoprecipitation (CLIP-Seq) or RNA immunoprecipitation (RIP-Seq) sequencing, DNA accessibility by assay for transposaseaccessible chromatin (ATAC-Seq), and DNase or MNase sequencing libraries. Analysis of these sequencing techniques involve library-specific approaches. However, in all cases, once the sequencing libraries are processed, the result is a count table specifying the estimated number of reads originating from a genomic locus. Differential analysis to determine which loci have different cellular activity under different conditions starts with the count table and iterates through a cycle of data assessment, preparation and analysis. Such iterative approach relies on multiple programs and is therefore a challenge for those without programming skills. ResultsWe developed DEBrowser, as an R bioconductor project, to interactively visualize each step of the differential analysis of count data, without any requirement for programming expertise. The application presents a rich and interactive web based graphical user interface based on R's shiny infrastructure. We use shiny's reactive programming interface for a dynamic webpage that responds to user input and integrates its visualization widgets at each stage of the analysis. In this way, every step of the analysis can be displayed in one application that combines many approaches and multiple results. We show DEBrowser's capabilities by reproducing the analysis of two previously published data sets. ConclusionsDEBrowser is a flexible, intuitive, web-based analysis platform that enables an iterative and interactive analysis of count data without any requirement of programming knowledge.
The zebrafish is ideal for studying embryogenesis and is increasingly applied to model human disease. In these contexts, RNA-sequencing (RNA-seq) provides mechanistic insights by identifying transcriptome changes between experimental conditions. Application of RNA-seq relies on accurate transcript annotation for a genome of interest. Here, we find discrepancies in analysis from RNA-seq datasets quantified using Ensembl and RefSeq zebrafish annotations. These issues were due, in part, to variably annotated 3' untranslated regions and thousands of gene models missing from each annotation. Since these discrepancies could compromise downstream analyses and biological reproducibility, we built a more comprehensive zebrafish transcriptome annotation that addresses these deficiencies. Our annotation improves detection of cell type-specific genes in both bulk and single cell RNA-seq datasets, where it also improves resolution of cell clustering. Thus, we demonstrate that our new transcriptome annotation can outperform existing annotations, providing an important resource for zebrafish researchers.
Background: The emergence of high throughput technologies that produce vast amounts of genomic data, such as next-generation sequencing (NGS) is transforming biological research. The dramatic increase in the volume of data, the variety and continuous change of data processing tools, algorithms and databases make analysis the main bottleneck for scientific discovery. The processing of high throughput datasets typically involves many different computational programs, each of which performs a specific step in a pipeline. Given the wide range of applications and organizational infrastructures, there is a great need for highly parallel, flexible, portable, and reproducible data processing frameworks. Several platforms currently exist for the design and execution of complex pipelines. Unfortunately, current platforms lack the necessary combination of parallelism, portability, flexibility and/or reproducibility that are required by the current research environment. To address these shortcomings, workflow frameworks that provide a platform to develop and share portable pipelines have recently arisen. We complement these new platforms by providing a graphical user interface to create, maintain, and execute complex pipelines. Such a platform will simplify robust and reproducible workflow creation for non-technical users as well as provide a robust platform to maintain pipelines for large organizations. Results: To simplify development, maintenance, and execution of complex pipelines we created DolphinNext. DolphinNext facilitates building and deployment of complex pipelines using a modular approach implemented in a graphical interface that relies on the powerful Nextflow workflow framework by providing 1. A drag and drop user interface that visualizes pipelines and allows users to create pipelines without familiarity in underlying programming languages. 2. Modules to execute and monitor pipelines in distributed computing environments such as highperformance clusters and/or cloud 3. Reproducible pipelines with version tracking and stand-alone versions that can be run independently. 4. Modular process design with process revisioning support to increase reusability and pipeline development efficiency. 5. Pipeline sharing with GitHub and automated testing 6. Extensive reports with Rmarkdown and shiny support for interactive data visualization and analysis. Conclusion: DolphinNext is a flexible, intuitive, web-based data processing and analysis platform that enables creating, deploying, sharing, and executing complex Nextflow pipelines with extensive revisioning and interactive reporting to enhance reproducible results.
Following testicular spermatogenesis, mammalian sperm continue to mature in a long epithelial tube known as the epididymis, which plays key roles in remodeling sperm protein, lipid, and RNA composition. To understand the roles for the epididymis in reproductive biology, we generated a single-cell atlas of the murine epididymis and vas deferens. We recovered key epithelial cell types including principal cells, clear cells, and basal cells, along with associated support cells that include fibroblasts, smooth muscle, macrophages and other immune cells. Moreover, our data illuminate extensive regional specialization of principal cell populations across the length of the epididymis. In addition to region-specific specialization of principal cells, we find evidence for functionally specialized subpopulations of stromal cells, and, most notably, two distinct populations of clear cells. Our dataset extends on existing knowledge of epididymal biology, and provides a wealth of information on potential regulatory and signaling factors that bear future investigation.
Rationale: Significant progress has revealed transcriptional inputs that underlie regulation of artery and vein endothelial cell fates. However, little is known concerning genome-wide regulation of this process. Therefore, such studies are warranted to address this gap. Objective: To identify and characterize artery- and vein-specific endothelial enhancers in the human genome, thereby gaining insights into mechanisms by which blood vessel identity is regulated. Methods and Results: Using chromatin immunoprecipitation and deep sequencing for markers of active chromatin in human arterial and venous endothelial cells, we identified several thousand artery- and vein-specific regulatory elements. Computational analysis revealed that NR2F2 (nuclear receptor subfamily 2, group F, member 2) sites were overrepresented in vein-specific enhancers, suggesting a direct role in promoting vein identity. Subsequent integration of chromatin immunoprecipitation and deep sequencing data sets with RNA sequencing revealed that NR2F2 regulated 3 distinct aspects related to arteriovenous identity. First, consistent with previous genetic observations, NR2F2 directly activated enhancer elements flanking cell cycle genes to drive their expression. Second, NR2F2 was essential to directly activate vein-specific enhancers and their associated genes. Our genomic approach further revealed that NR2F2 acts with ERG (ETS-related gene) at many of these sites to drive vein-specific gene expression. Finally, NR2F2 directly repressed only a small number of artery enhancers in venous cells to prevent their activation, including a distal element upstream of the artery-specific transcription factor, HEY2 (hes related family bHLH transcription factor with YRPW motif 2). In arterial endothelial cells, this enhancer was normally bound by ERG, which was also required for arterial HEY2 expression. By contrast, in venous endothelial cells, NR2F2 was bound to this site, together with ERG, and prevented its activation. Conclusions: By leveraging a genome-wide approach, we revealed mechanistic insights into how NR2F2 functions in multiple roles to maintain venous identity. Importantly, characterization of its role at a crucial artery enhancer upstream of HEY2 established a novel mechanism by which artery-specific expression can be achieved.
Following spermatogenesis in the testis, mammalian sperm continue to mature over the course of approximately 10 days as they transit a long epithelial tube known as the epididymis. The epididymis is comprised of multiple segments/compartments that, in addition to concentrating sperm and preventing their premature activation, play key roles in remodeling the protein, lipid, and RNA composition of maturing sperm. In order to understand the complex roles for the epididymis in reproductive biology, we generated a single cell atlas of gene expression from the murine epididymis and vas deferens. We recovered all the key cell types of the epididymal epithelium, including principal cells, clear cells, and basal cells, along with associated support cells that include fibroblasts, smooth muscle, macrophages and other immune cells. Moreover, our data illuminate extensive regional specialization of principal cell populations across the length of the epididymis, with a substantial fraction of segment-specific genes localized in genomic clusters of functionally-related genes. In addition to the extensive region-specific specialization of principal cells, we find evidence for functionally-specialized subpopulations of stromal cells, and, most notably, two distinct populations of clear cells. Analysis of ligand/receptor expression reveals a network of potential cellular signaling connections, with several predicted interactions between cell types that may play roles in immune cell recruitment and other aspects of epididymal function. Our dataset extends on existing knowledge of epididymal biology, and provides a wealth of information on potential regulatory and signaling factors that bear future investigation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.