Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org.
BackgroundMassive parallel sequencing is a powerful tool for variant discovery and genotyping. To reduce costs, sequencing of restriction enzyme based reduced representation libraries can be utilized. This technology is generally referred to as Genotyping By Sequencing (GBS). To deal with GBS experimental design and initial processing specific bioinformatic tools are needed.ResultsGBSX is a package that assists in selecting the appropriate enzyme and the design of compatible in-line barcodes. Post sequencing, it performs optimized demultiplexing using these barcodes to create fastq files per barcode which can easily be plugged into existing variant analysis pipelines. Here we demonstrate the usability of the GBSX toolkit and demonstrate improved in-line barcode demultiplexing and trimming performance compared to existing tools.ConclusionsGBSX provides an easy to use suite of tools for designing and demultiplexing of GBS experiments.
The Atlantic bluefin tuna is a highly migratory species emblematic of the challenges associated with shared fisheries management. In an effort to resolve the species' stock dynamics, a genomewide search for spatially informative single nucleotide polymorphisms (SNPs) was undertaken, by way of sequencing reduced representation libraries. An allele frequency approach to SNP discovery was used, combining the data of 555 larvae and young-of-the-year (LYOY) into pools representing major geographical areas and mapping against a newly assembled genomic reference. From a set of 184,895 candidate loci, 384 were selected for validation using 167 LYOY. A highly discriminatory genotyping panel of 95 SNPs was ultimately developed by selecting loci with the most pronounced differences between western Atlantic and Mediterranean Sea LYOY. The panel was evaluated by genotyping a different set of LYOY (n = 326), and from these, 77.8% and 82.1% were correctly assigned to western Atlantic and Mediterranean Sea origins, respectively. The panel revealed temporally persistent differentiation among LYOY from the western Atlantic and Mediterranean Sea (F = 0.008, p = .034). The composition of six mixed feeding aggregations in the Atlantic Ocean and Mediterranean Sea was characterized using genotypes from medium (n = 184) and large (n = 48) adults, applying population assignment and mixture analyses. The results provide evidence of persistent population structuring across broad geographic areas and extensive mixing in the Atlantic Ocean, particularly in the mid-Atlantic Bight and Gulf of St. Lawrence. The genomic reference and genotyping tools presented here constitute novel resources useful for future research and conservation efforts.
The common octopus, Octopus vulgaris , is an active marine predator known for the richness and plasticity of its behavioral repertoire, and remarkable learning and memory capabilities. Octopus and other coleoid cephalopods, cuttlefish and squid, possess the largest nervous system among invertebrates, both for cell counts and body to brain size. O . vulgaris has been at the center of a long-tradition of research into diverse aspects of its biology. To leverage research in this iconic species, we generated 270 Gb of genomic sequencing data, complementing those available for the only other sequenced congeneric octopus, Octopus bimaculoides . We show that both genomes are similar in size, but display different levels of heterozygosity and repeats. Our data give a first quantitative glimpse into the rate of coding and non-coding regions and support the view that hundreds of novel genes may have arisen independently despite the close phylogenetic distance. We furthermore describe a reference-guided assembly and an open genomic resource (CephRes-gdatabase), opening new avenues in the study of genomic novelties in cephalopods and their biology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.