Functional genomics assays based on high-throughput sequencing greatly expand our ability to understand the genome. Here, we define the ENCODE blacklist- a comprehensive set of regions in the human, mouse, worm, and fly genomes that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment. The removal of the ENCODE blacklist is an essential quality measure when analyzing functional genomics data.
The Saccharomyces cerevisiae ribosomal DNA (rDNA) locus is known to exhibit greater instability relative to the rest of the genome. However, wild-type cells preferentially maintain a stable number of rDNA copies, suggesting underlying genetic control of the size of this locus. We performed a screen of a subset of the Yeast Knock-Out (YKO) single gene deletion collection to identify genetic regulators of this locus and to determine if rDNA copy number correlates with yeast replicative lifespan. While we found no correlation between replicative lifespan and rDNA size, we identified 64 candidate strains with significant rDNA copy number differences. However, in the process of validating candidate rDNA variants, we observed that independent isolates of our de novo gene deletion strains had unsolicited but significant changes in rDNA copy number. Moreover, we were not able to recapitulate rDNA phenotypes from the YKO yeast deletion collection. Instead, we found that the standard lithium acetate transformation protocol is a significant source of rDNA copy number variation, with lithium acetate exposure being the treatment causing variable rDNA copy number events after transformation. As the effects of variable rDNA copy number are being increasingly reported, our finding that rDNA is affected by lithium acetate exposure suggested that rDNA copy number variants may be influential passenger mutations in standard strain construction in S. cerevisiae.
A form of dwarfism known as Meier-Gorlin syndrome (MGS) is caused by recessive mutations in one of six different genes (ORC1, ORC4, ORC6, CDC6, CDT1, and MCM5). These genes encode components of the pre-replication complex, which assembles at origins of replication prior to S phase. Also, variants in two additional replication initiation genes have joined the list of causative mutations for MGS (Geminin and CDC45). The identity of the causative MGS genetic variants strongly suggests that some aspect of replication is amiss in MGS patients; however, little evidence has been obtained regarding what aspect of chromosome replication is faulty. Since the site of one of the missense mutations in the human ORC4 alleles is conserved between humans and yeast, we sought to determine in what way this single amino acid change affects the process of chromosome replication, by introducing the comparable mutation into yeast (orc4Y232C). We find that yeast cells with the orc4Y232C allele have a prolonged S-phase, due to compromised replication initiation at the ribosomal DNA (rDNA) locus located on chromosome XII. The inability to initiate replication at the rDNA locus results in chromosome breakage and a severely reduced rDNA copy number in the survivors, presumably helping to ensure complete replication of chromosome XII. Although reducing rDNA copy number may help ensure complete chromosome replication, orc4Y232C cells struggle to meet the high demand for ribosomal RNA synthesis. This finding provides additional evidence linking two essential cellular pathways—DNA replication and ribosome biogenesis.
Free living bacteria adapt to changes in the environment by reprogramming gene expression through precise interactions of hundreds of DNA-binding proteins. Technologies such as ChIP-seq enable targeted characterization of regulatory interactions for individual DNA-binding proteins. However, in order to understand the cell's global regulatory logic, we need to simultaneously monitor all such interactions in response to diverse genetic and environmental perturbations. To address this challenge, we have developed high-resolution in vivo protein occupancy display (IPOD-HR), a technology that enables rapid, quantitative, and comprehensive monitoring of DNA-protein interactions across a bacterial chromosome. IPOD-HR enables simultaneous activity profiling of all known sequence specific transcription factors, discovery of novel condition-dependent DNA-binding proteins, and systematic inference of binding specificity models for all bound transcription factors. IPOD-HR also reveals many large domains of extended protein occupancy in Escherichia coli that define relatively stable, transcriptionally silent regions with unique sequence and gene functional features.
Free-living bacteria adapt to environmental change by reprogramming gene expression through precise interactions of hundreds of DNA-binding proteins. A predictive understanding of bacterial physiology requires us to globally monitor all such protein–DNA interactions across a range of environmental and genetic perturbations. Here, we show that such global observations are possible using an optimized version of in vivo protein occupancy display technology (in vivo protein occupancy display—high resolution, IPOD-HR) and present a pilot application to Escherichia coli. We observe that the E. coli protein–DNA interactome organizes into 2 distinct prototypic features: (1) highly dynamic condition-dependent transcription factor (TF) occupancy; and (2) robust kilobase scale occupancy by nucleoid factors, forming silencing domains analogous to eukaryotic heterochromatin. We show that occupancy dynamics across a range of conditions can rapidly reveal the global transcriptional regulatory organization of a bacterium. Beyond discovery of previously hidden regulatory logic, we show that these observations can be utilized to computationally determine sequence specificity models for the majority of active TFs. Our study demonstrates that global observations of protein occupancy combined with statistical inference can rapidly and systematically reveal the transcriptional regulatory and structural features of a bacterial genome. This capacity is particularly crucial for non-model bacteria that are not amenable to routine genetic manipulation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.