We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
The Generic Model Organism System Database Project (GMOD) seeks to develop reusable software components for model organism system databases. In this paper we describe the Generic Genome Browser (GBrowse), a Web-based application for displaying genomic annotations and other features. For the end user, features of the browser include the ability to scroll and zoom through arbitrary regions of a genome, to enter a region of the genome by searching for a landmark or performing a full text search of all features, and the ability to enable and disable tracks and change their relative order and appearance. The user can upload private annotations to view them in the context of the public ones, and publish those annotations to the community. For the data provider, features of the browser software include reliance on readily available open source components, simple installation, flexible configuration, and easy integration with other components of a model organism system Web site. GBrowse is freely available under an open source license. The software, its documentation, and support are available at http://www.gmod.org.Model organism system databases (MODs) are a vital tool for scientific research. They share a common set of tasks: to collect and curate data from the scientific literature such as mutations, alleles, genetic and physical maps, and phenotypes; to integrate this information with the results of large-scale experiments such as microarray studies, SNP screens, and protein-interaction studies; to provide reagent resources such as stocks, genetic constructs, and clones; and, lastly, to provide a common nomenclature for gene symbols, anatomic terms, and other elements of the scientific vocabulary. By integrating, and in some cases reanalyzing, these data, MODs are able to greatly enhance their value. This information is made available to the research community via a Web site that also serves as a nexus for discussions, announcements of interest to the community, and data submissions.
Three-prime untranslated regions (3′UTRs) of metazoan messenger RNAs (mRNAs) contain numerous regulatory elements, yet remain largely uncharacterized. Using polyA capture, 3′ rapid amplification of complementary DNA (cDNA) ends, full-length cDNAs, and RNA-seq, we defined ∼26,000 distinct 3′UTRs in Caenorhabditis elegans for ∼85% of the 18,328 experimentally supported protein-coding genes and revised ∼40% of gene models. Alternative 3′UTR isoforms are frequent, often differentially expressed during development. Average 3′UTR length decreases with animal age. Surprisingly, no polyadenylation signal (PAS) was detected for 13% of polyadenylation sites, predominantly among shorter alternative isoforms. Trans-spliced (versus non–trans-spliced) mRNAs possess longer 3′UTRs and frequently contain no PAS or variant PAS. We identified conserved 3′UTR motifs, isoform-specific predicted microRNA target sites, and polyadenylation of most histone genes. Our data reveal a rich complexity of 3′UTRs, both genome-wide and throughout development.
BackgroundTissue-specific RNA plasticity broadly impacts the development, tissue identity and adaptability of all organisms, but changes in composition, expression levels and its impact on gene regulation in different somatic tissues are largely unknown. Here we developed a new method, polyA-tagging and sequencing (PAT-Seq) to isolate high-quality tissue-specific mRNA from Caenorhabditis elegans intestine, pharynx and body muscle tissues and study changes in their tissue-specific transcriptomes and 3’UTRomes.ResultsWe have identified thousands of novel genes and isoforms differentially expressed between these three tissues. The intestine transcriptome is expansive, expressing over 30% of C. elegans mRNAs, while muscle transcriptomes are smaller but contain characteristic unique gene signatures. Active promoter regions in all three tissues reveal both known and novel enriched tissue-specific elements, along with putative transcription factors, suggesting novel tissue-specific modes of transcription initiation. We have precisely mapped approximately 20,000 tissue-specific polyadenylation sites and discovered that about 30% of transcripts in somatic cells use alternative polyadenylation in a tissue-specific manner, with their 3’UTR isoforms significantly enriched with microRNA targets.ConclusionsFor the first time, PAT-Seq allowed us to directly study tissue specific gene expression changes in an in vivo setting and compare these changes between three somatic tissues from the same organism at single-base resolution within the same experiment. We pinpoint precise tissue-specific transcriptome rearrangements and for the first time link tissue-specific alternative polyadenylation to miRNA regulation, suggesting novel and unexplored tissue-specific post-transcriptional regulatory networks in somatic cells.Electronic supplementary materialThe online version of this article (doi:10.1186/s12915-015-0116-6) contains supplementary material, which is available to authorized users.
As a step towards comprehensive functional analysis of genomes, systematic gene knockout projects have been initiated in several organisms [1]. In metazoans like C. elegans, however, maternal contribution can mask the effects of gene knockouts on embryogenesis. RNA interference (RNAi) provides an alternative rapid approach to obtain loss-of-function information that can also reveal embryonic roles for the genes targeted [2,3]. We have used RNAi to analyze a random set of ovarian transcripts and have identified 81 genes with essential roles in embryogenesis. Surprisingly, none of them maps on the X chromosome. Of these 81 genes, 68 showed defects before the eight-cell stage and could be grouped into ten phenotypic classes. To archive and distribute these data we have developed a database system directly linked to the C. elegans database (Wormbase). We conclude that screening cDNA libraries by RNAi is an efficient way of obtaining in vivo function for a large group of genes. Furthermore, this approach is directly applicable to other organisms sensitive to RNAi and whose genomes have not yet been sequenced.
Alternative polyadenylation (APA) is observed in virtually all metazoans and results in mRNA isoforms with different 3’ends. It is routinely...
MicroRNAs (miRNAs) are short non-coding RNAs that regulate gene output at the post-transcriptional level by targeting degenerate elements primarily in 3′untranslated regions (3′UTRs) of mRNAs. Individual miRNAs can regulate networks of hundreds of genes, yet for the majority of miRNAs few, if any, targets are known. Misexpression of miRNAs is also a major contributor to cancer progression, thus there is a critical need to validate miRNA targets in high-throughput to understand miRNAs' contribution to tumorigenesis. Here we introduce a novel high-throughput assay to detect miRNA targets in 3′UTRs, called Luminescent Identification of Functional Elements in 3′UTRs (3′LIFE). We demonstrate the feasibility of 3′LIFE using a data set of 275 human 3′UTRs and two cancer-relevant miRNAs, let-7c and miR-10b, and compare our results to alternative methods to detect miRNA targets throughout the genome. We identify a large number of novel gene targets for these miRNAs, with only 32% of hits being bioinformatically predicted and 27% directed by non-canonical interactions. Functional analysis of target genes reveals consistent roles for each miRNA as either a tumor suppressor (let-7c) or oncogenic miRNA (miR-10b), and preferentially target multiple genes within regulatory networks, suggesting 3′LIFE is a rapid and sensitive method to detect miRNA targets in high-throughput.
′ Untranslated regions (3 ′ UTRs) of mRNAs emerged as central regulators of cellular function because they contain important but poorly characterized cis-regulatory elements targeted by a multitude of regulatory factors. The model nematode Caenorhabditis elegans is ideal to study these interactions because it possesses a well-defined 3 ′ UTRome. To improve its annotation, we have used a genome-wide bioinformatics approach to download raw transcriptome data for 1088 transcriptome data sets corresponding to the entire collection of C. elegans trancriptomes from 2015 to 2018 from the Sequence Read Archive at the NCBI. We then extracted and mapped high-quality 3 ′-UTR data at ultradeep coverage. Here, we describe and release to the community the updated version of the worm 3 ′ UTRome, which we named 3 ′ UTRome v2. This resource contains high-quality 3 ′-UTR data mapped at single-base ultraresolution for 23,084 3 ′-UTR isoform variants corresponding to 14,788 protein-coding genes and is updated to the latest release of WormBase. We used this data set to study and probe principles of mRNA cleavage and polyadenylation in C. elegans. The worm 3 ′ UTRome v2 represents the most comprehensive and high-resolution 3 ′-UTR data set available in C. elegans and provides a novel resource to investigate the mRNA cleavage and polyadenylation reaction, 3 ′-UTR biology, and miRNA targeting in a living organism.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.