Abstract:Dynamic cellular responses to environmental constraints are coordinated by the transcriptional regulatory network (TRN), which modulates gene expression. This network controls most fundamental cellular responses, including metabolism, motility, and stress responses. Here, we apply independent component analysis, an unsupervised machine learning approach, to 95 high-quality Sulfolobus acidocaldarius RNA-seq datasets and extract 45 independently modulated gene sets, or iModulons. Together, these iModulons contai… Show more
“…Taken together, PRECISE-1K and iModulons extracted from it highlight the central role that top-down, data-driven methods must take in transcriptional regulatory network discovery across organisms. Indeed, iModulons have already successfully generated top-down regulatory networks for other organisms (Chauhan et al, 2021; Lim et al, 2022; Poudel et al, 2020; Rajput et al, 2022; Rychel et al, 2020; Sastry et al, 2019; Yoo et al, 2022). The success of PRECISE-1K serves to further cement both the importance of pursuing such efforts and the reliability of the results.…”
Section: Discussionmentioning
confidence: 99%
“…Independent component analysis (ICA) (Comon, 1994) is a signal processing algorithm that outperforms other methods for the extraction of biologically meaningful regulatory modules from gene expression data (Saelens et al, 2018). Application of this method to publicly-available prokaryotic expression data has consistently recovered TRN modules across organisms (Chauhan et al, 2021; Poudel et al, 2020; Rajput et al, 2022; Rychel et al, 2020; Sastry et al, 2019; Yoo et al, 2022). ICA’s effectiveness results from its ability to identify independent groups of genes that vary consistently across samples, regardless of group size or overlapping membership.…”
Uncovering the structure of the transcriptional regulatory network (TRN) that modulates gene expression in prokaryotes remains an important challenge. Transcriptomics data is plentiful, necessitating the development of scalable methods for converting this data into useful knowledge about the TRN. Previously, we published the PRECISE dataset for Escherichia coli K-12 MG1655, containing 278 RNA-seq datasets created using a standardized protocol. Here, we present PRECISE 2.0, which is nearly three times the size of the original PRECISE dataset and also created using a standardized protocol. We analyze PRECISE 2.0 at multiple scales, demonstrating multiple analytical strategies for extracting knowledge from this dataset. Specifically, we: (1) highlight patterns in gene expression across the dataset; (2) utilize independent component analysis to extract 218 independently modulated groups of genes (iModulons) that describe the TRN at the systems level; (3) demonstrate the utility of iModulons over traditional differential expression analysis; and (4) uncover 6 new potential regulons. Thus, PRECISE 2.0 is a large-scale, high-quality transcriptomics dataset which may be analyzed at multiple scales to yield important biological insights.
“…Taken together, PRECISE-1K and iModulons extracted from it highlight the central role that top-down, data-driven methods must take in transcriptional regulatory network discovery across organisms. Indeed, iModulons have already successfully generated top-down regulatory networks for other organisms (Chauhan et al, 2021; Lim et al, 2022; Poudel et al, 2020; Rajput et al, 2022; Rychel et al, 2020; Sastry et al, 2019; Yoo et al, 2022). The success of PRECISE-1K serves to further cement both the importance of pursuing such efforts and the reliability of the results.…”
Section: Discussionmentioning
confidence: 99%
“…Independent component analysis (ICA) (Comon, 1994) is a signal processing algorithm that outperforms other methods for the extraction of biologically meaningful regulatory modules from gene expression data (Saelens et al, 2018). Application of this method to publicly-available prokaryotic expression data has consistently recovered TRN modules across organisms (Chauhan et al, 2021; Poudel et al, 2020; Rajput et al, 2022; Rychel et al, 2020; Sastry et al, 2019; Yoo et al, 2022). ICA’s effectiveness results from its ability to identify independent groups of genes that vary consistently across samples, regardless of group size or overlapping membership.…”
Uncovering the structure of the transcriptional regulatory network (TRN) that modulates gene expression in prokaryotes remains an important challenge. Transcriptomics data is plentiful, necessitating the development of scalable methods for converting this data into useful knowledge about the TRN. Previously, we published the PRECISE dataset for Escherichia coli K-12 MG1655, containing 278 RNA-seq datasets created using a standardized protocol. Here, we present PRECISE 2.0, which is nearly three times the size of the original PRECISE dataset and also created using a standardized protocol. We analyze PRECISE 2.0 at multiple scales, demonstrating multiple analytical strategies for extracting knowledge from this dataset. Specifically, we: (1) highlight patterns in gene expression across the dataset; (2) utilize independent component analysis to extract 218 independently modulated groups of genes (iModulons) that describe the TRN at the systems level; (3) demonstrate the utility of iModulons over traditional differential expression analysis; and (4) uncover 6 new potential regulons. Thus, PRECISE 2.0 is a large-scale, high-quality transcriptomics dataset which may be analyzed at multiple scales to yield important biological insights.
“…Additionally, we found both the WhiB1 and GroEL/ES complex iModulons play a role in protein synthesis. WhiB1 also contains several genes that code for RNA polymerase subunits, and is likely a translation iModulon that has been seen in the ICA decompositions of other organisms ( 9 , 10 , 45 ). All three iModulons are related to growth and replication, which suggests that cell division is an important response in M. tuberculosis in a decreasing oxygen environment.…”
Mycobacterium tuberculosis
H37Rv is one of the world's most impactful pathogens, and a large part of the success of the organism relies on the differential expression of its genes to adapt to its environment. The expression of the organism's genes is driven primarily by its transcriptional regulatory network, and most research on the TRN focuses on identifying and quantifying clusters of coregulated genes known as regulons.
“…and genes involved in translation such as infA and fusA which encode translation initiation factor IF-1 and elongation factor G respectively(Figure 2b). This iModulon has been enriched in almost all bacteria and archaea for which iModulons have been calculated 20,[22][23][24][25] .…”
Section: Expanding the Usa300 Imodulons Using Rna-sequencing Data Fro...mentioning
The complex crosstalk between metabolism and gene regulatory networks makes it difficult to untangle individual constituents and study their precise roles and interactions. To address this issue, we modularized the transcriptional regulatory network (TRN) of the Staphylococcus aureus strain by applying Independent Component Analysis (ICA) to 385 RNA sequencing samples. We then combined the modular TRN model with a metabolic model to study the regulation of carbon and amino acid metabolism. Our analysis showed that regulation of central carbon metabolism by CcpA and amino acid biosynthesis by CodY are closely coordinated. In general, S. aureus increases the expression of CodY-regulated genes in the presence of preferred carbons sources such as glucose. This transcriptional coordination was corroborated by metabolic model simulations that also showed increased amino acid biosynthesis in the presence of glucose. Further, we found that CodY and CcpA cooperatively regulate the expression of ribosome hibernation promoting factor, thus linking metabolic cues with translation. In line with this hypothesis, expression of CodY-regulated genes is tightly correlated with expression of genes encoding ribosomal proteins. Together, we propose a coarse-grained model where expression of S. aureus genes encoding enzymes that control carbon flux and nitrogen flux through the system is coregulated with expression of translation machinery to modularly control protein synthesis. While this work focuses on three key regulators, the full TRN model we present contains 76 total independently modulated sets of genes, each with the potential to uncover other complex regulatory structures and interactions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.