Intracellular pathogens such as Mycobacterium tuberculosis have evolved strategies for coping with the pressures encountered inside host cells. The ability to coordinate global gene expression in response to environmental and internal cues is one key to their success. Prolonged survival and replication within macrophages, a key virulence trait of M. tuberculosis , requires dynamic adaptation to diverse and changing conditions within its phagosomal niche. However, the physiological adaptations during the different phases of this infection process remain poorly understood. To address this knowledge gap, we have developed a multi-tiered approach to define the temporal patterns of gene expression in M. tuberculosis in a macrophage infection model that extends from infection, through intracellular adaptation, to the establishment of a productive infection. Using a clock plasmid to measure intracellular replication and death rates over a 14-day infection and electron microscopy to define bacterial integrity, we observed an initial period of rapid replication coupled with a high death rate. This was followed by period of slowed growth and enhanced intracellular survival, leading finally to an extended period of net growth. The transcriptional profiles of M. tuberculosis reflect these physiological transitions as the bacterium adapts to conditions within its host cell. Finally, analysis with a Transcriptional Regulatory Network model revealed linked genetic networks whereby M. tuberculosis coordinates global gene expression during intracellular survival. The integration of molecular and cellular biology together with transcriptional profiling and systems analysis offers unique insights into the host-driven responses of intracellular pathogens such as M. tuberculosis .
Only few small RNAs (sRNAs) have been characterized in Mycobacterium tuberculosis and their role in regulatory networks is still poorly understood. Here we report a genome-wide characterization of sRNAs in M. tuberculosis integrating experimental and computational analyses. Global RNA-seq analysis of exponentially growing cultures of M. tuberculosis H37Rv had previously identified 1373 sRNA species. In the present report we show that 258 (19%) of these were also identified by microarray expression. This set included 22 intergenic sRNAs, 84 sRNAs mapping within 5′/3′ UTRs, and 152 antisense sRNAs. Analysis of promoter and terminator consensus sequences identified sigma A promoter consensus sequences for 121 sRNAs (47%), terminator consensus motifs for 22 sRNAs (8.5%), and both motifs for 35 sRNAs (14%). Additionally, 20/23 candidates were visualized by Northern blot analysis and 5′ end mapping by primer extension confirmed the RNA-seq data. We also used a computational approach utilizing functional enrichment to identify the pathways targeted by sRNA regulation. We found that antisense sRNAs preferentially regulated transcription of membrane-bound proteins. Genes putatively regulated by novel cis-encoded sRNAs were enriched for two-component systems and for functional pathways involved in hydrogen transport on the membrane.
The escalating amount of genome-scale data demands a pragmatic stance from the research community. How can we utilize this deluge of information to better understand biology, cure diseases, or engage cells in bioremediation or biomaterial production for various purposes? A research pipeline moving new sequence, expression and binding data towards practical end goals seems to be necessary. While most individual researchers are not motivated by such well-articulated pragmatic end goals, the scientific community has already self-organized itself to successfully convert genomic data into fundamentally new biological knowledge and practical applications. Here we review two important steps in this workflow: network inference and network response identification, applied to transcriptional regulatory networks. Among network inference methods, we concentrate on relevance networks due to their conceptual simplicity. We classify and discuss network response identification approaches as either data-centric or network-centric. Finally, we conclude with an outlook on what is still missing from these approaches and what may be ahead on the road to biological discovery.
The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize several types of cancer. Datasets from biomedical domains such as TCGA present a particularly challenging task for those interested in dynamically aggregating its results because the data sources are typically both heterogeneous and distributed. The Linked Data best practices offer a solution to integrate and discover data with those characteristics, namely through exposure of data as Web services supporting SPARQL, the Resource Description Framework query language. Most SPARQL endpoints, however, cannot easily be queried by data experts. Furthermore, exposing experimental data as SPARQL endpoints remains a challenging task because, in most cases, data must first be converted to Resource Description Framework triples. In line with those requirements, we have developed an infrastructure to expose clinical, demographic and molecular data elements generated by TCGA as a SPARQL endpoint by assigning elements to entities of the Simple Sloppy Semantic Database (S3DB) management model. All components of the infrastructure are available as independent Representational State Transfer (REST) Web services to encourage reusability, and a simple interface was developed to automatically assemble SPARQL queries by navigating a representation of the TCGA domain. A key feature of the proposed solution that greatly facilitates assembly of SPARQL queries is the distinction between the TCGA domain descriptors and data elements. Furthermore, the use of the S3DB management model as a mediator enables queries to both public and protected data without the need for prior submission to a single data source.
BackgroundData, data everywhere. The diversity and magnitude of the data generated in the Life Sciences defies automated articulation among complementary efforts. The additional need in this field for managing property and access permissions compounds the difficulty very significantly. This is particularly the case when the integration involves multiple domains and disciplines, even more so when it includes clinical and high throughput molecular data.Methodology/Principal FindingsThe emergence of Semantic Web technologies brings the promise of meaningful interoperation between data and analysis resources. In this report we identify a core model for biomedical Knowledge Engineering applications and demonstrate how this new technology can be used to weave a management model where multiple intertwined data structures can be hosted and managed by multiple authorities in a distributed management infrastructure. Specifically, the demonstration is performed by linking data sources associated with the Lung Cancer SPORE awarded to The University of Texas MDAnderson Cancer Center at Houston and the Southwestern Medical Center at Dallas. A software prototype, available with open source at www.s3db.org, was developed and its proposed design has been made publicly available as an open source instrument for shared, distributed data management.Conclusions/SignificanceThe Semantic Web technologies have the potential to addresses the need for distributed and evolvable representations that are critical for systems Biology and translational biomedical research. As this technology is incorporated into application development we can expect that both general purpose productivity software and domain specific software installed on our personal computers will become increasingly integrated with the relevant remote resources. In this scenario, the acquisition of a new dataset should automatically trigger the delegation of its analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.