We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
Caenorhabditis elegans is an animal with few cells but a wide diversity of cell types. In this study, we characterize the molecular basis for their specification by profiling the transcriptomes of 86,024 single embryonic cells. We identify 502 terminal and preterminal cell types, mapping most single-cell transcriptomes to their exact position in C. elegans’ invariant lineage. Using these annotations, we find that (i) the correlation between a cell’s lineage and its transcriptome increases from middle to late gastrulation, then falls substantially as cells in the nervous system and pharynx adopt their terminal fates; (ii) multilineage priming contributes to the differentiation of sister cells at dozens of lineage branches; and (iii) most distinct lineages that produce the same anatomical cell type converge to a homogenous transcriptomic state.
Regulation of gene expression by sequence-specific transcription factors is central to developmental programs and depends on the binding of transcription factors with target sites in the genome. To date, most such analyses in Caenorhabditis elegans have focused on the interactions between a single transcription factor with one or a few select target genes. As part of the modENCODE Consortium, we have used chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-seq) to determine the genome-wide binding sites of 22 transcription factors (ALR-1, BLMP-1, CEH-14, CEH-30, EGL-27, EGL-5, ELT-3, EOR-1, GEI-11, HLH-1, LIN-11, LIN-13, LIN-15B, LIN-39, MAB-5, MDL-1, MEP-1, PES-1, PHA-4, PQM-1, SKN-1, and UNC-130) at diverse developmental stages. For each factor we determined candidate gene targets, both coding and non-coding. The typical binding sites of almost all factors are within a few hundred nucleotides of the transcript start site. Most factors target a mixture of coding and non-coding target genes, although one factor preferentially binds to non-coding RNA genes. We built a regulatory network among the 22 factors to determine their functional relationships to each other and found that some factors appear to act preferentially as regulators and others as target genes. Examination of the binding targets of three related HOX factors-LIN-39, MAB-5, and EGL-5-indicates that these factors regulate genes involved in cellular migration, neuronal function, and vulval differentiation, consistent with their known roles in these developmental processes. Ultimately, the comprehensive mapping of transcription factor binding sites will identify features of transcriptional networks that regulate C. elegans developmental processes.
Understanding the in vivo dynamics of protein localization and their physical interactions is important for many problems in Biology. To enable systematic protein function interrogation in a multicelluar context, we built a genome-scale transgenic platform for in vivo expression of fluorescent and affinity tagged proteins in Caenorhabditis elegans under endogenous cis regulatory control. The platform combines computer-assisted transgene design, massively parallel DNA engineering and next generation sequencing to generate a resource of 14637 genomic DNA transgenes, which covers 73% of the proteome. The multipurpose tag used allows any protein of interest to be localized in vivo or affinity purified using standard tag-based assays. We illustrate the utility of the resource by systematic chromatin immunopurification and automated 4D imaging, which produced detailed DNA binding and cell/tissue distribution maps for key transcription factor proteins
How cells adopt different expression patterns is a fundamental question of developmental biology. We quantitatively measured reporter expression of 127 genes, primarily transcription factors, in every cell and with high temporal resolution in C. elegans embryos. Embryonic cells are highly distinct in their gene expression; expression of the 127 genes studied here can distinguish nearly all pairs of cells, even between cells of the same tissue type. We observed recurrent lineage-regulated expression patterns for many genes in diverse contexts. These patterns are regulated in part by the TCF-LEF transcription factor POP-1. Other genes' reporters exhibited patterns correlated with tissue, position, and left–right asymmetry. Sequential patterns both within tissues and series of sublineages suggest regulatory pathways. Expression patterns often differ between embryonic and larval stages for the same genes, emphasizing the importance of profiling expression in different stages. This work greatly expands the number of genes in each of these categories and provides the first large-scale, digitally based, cellular resolution compendium of gene expression dynamics in live animals. The resulting data sets will be a useful resource for future research.
C. elegans is an animal with few cells, but a striking diversity of cell types. Here, we characterize the molecular basis for their specification by profiling the transcriptomes of 84,625 single embryonic cells. We identify 284 terminal and pre-terminal cell types, mapping most single cell transcriptomes to their exact position in C. elegans' invariant lineage. We use these annotations to perform the first quantitative analysis of the relationship between lineage and the transcriptome for a whole organism. We find that a strong lineage-transcriptome correlation in the early embryo breaks down in the final two cell divisions as cells adopt their terminal fates and that most distinct lineages that produce the same anatomical cell type converge to a homogenous transcriptomic state. Users can explore our data with a graphical application "VisCello". Main text:To understand how cell fates are specified during development, it is essential to know the temporal sequence of gene expression in cells during their trajectories from uncommitted precursors to differentiated terminal cell types. Gene expression patterns near branch points in these trajectories can help identify candidate regulators of cell fate decisions (1). Single cell RNA sequencing (sc-RNA-seq) has made it possible to obtain comprehensive measurements of Fig 3. Developmental trajectories of ciliated neurons. (A) UMAP of ciliated neurons and precursors.Colors correspond to cell identity. Text labels indicate terminal cells. Numbers 1-13 indicate parents of 1 ADE-ADA, 2 CEP-URX 3 IL1 4 OLL 5 OLQ 6 ASJ-AUA 7 ASE 8 ASI 9 ASK 10 ADF-AWB 11 ASG-AWA 12 ADL 13 AFD-RMD. 3-5, 7-9, and 12 are listed as parents of only one cell type as the sister cells die. Numbers 14-17 indicate grandparents of 14 IL1 (= IL2 parent) 15 OLQ-URY 16, 17 ASE-ASJ-AUA. 18 indicates a progenitor cluster that includes the AWC-SAAVx and BAG-SMDVx parents, which were identified in a separate UMAP (Fig. S12C). This latter analysis also tentatively identified a few cells near the base of the ASH trajectory as the ASH-RIB parent. Late stage AUA cells cluster with non-ciliated neurons and are not included in this UMAP but are included in the heatmap in panel D. The tiny cluster of cells labeled with an asterisk (*) is putatively AWC-ON based on srt-28 expression. (B) UMAP plot colored by embryo time (colors matched to Fig. 1A) and gene expression (red indicates >0 reads for the listed gene). mcm-7 is gene associated with the cell cycle. unc-130 is known to be expressed in the ASG-AWA neuroblast but neither terminal cell (40) (C) Cartoon illustrating the lineage of the ASE, ASJ, and AUA neurons. (D) Heatmap showing patterns of differential transcription factor expression associated with branches in the ASE-ASJ-AUA lineage. Expression values are log-transformed, then centered and scaled by standard deviation for each row (gene).
Transcription factors are key components of regulatory networks that control development, as well as the response to environmental stimuli. We have established an experimental pipeline in Caenorhabditis elegans that permits global identification of the binding sites for transcription factors using chromatin immunoprecipitation and deep sequencing. We describe and validate this strategy, and apply it to the transcription factor PHA-4, which plays critical roles in organ development and other cellular processes. We identified thousands of binding sites for PHA-4 during formation of the embryonic pharynx, and also found a role for this factor during the starvation response. Many binding sites were found to shift dramatically between embryos and starved larvae, from developmentally regulated genes to genes involved in metabolism. These results indicate distinct roles for this regulator in two different biological processes and demonstrate the versatility of transcription factors in mediating diverse biological roles.
The C. elegans cell lineage provides a unique opportunity to look at how cell lineage affects patterns of gene expression. We developed an automatic cell lineage analyzer that converts high-resolution images of worms into a data table showing fluorescence expression with single cell resolution. We generated expression profiles of 93 genes in 363 specific cells from L1 stage larvae and found that cells with identical fates can be formed by different gene regulatory pathways. We used molecular signatures to find repeating cell fate modules within the cell lineage and to create a molecular differentiation map, which shows points in the cell lineage when developmental fates of daughter cells begin to diverge. These results demonstrate insights that become possible using computational approaches to analyze quantitative expression from many genes in parallel using a digital gene expression atlas.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.