Unraveling gene function is pivotal to understanding the signaling cascades that control plant development and stress responses. As experimental profiling is costly and labor intensive, there is a clear need for highconfidence computational annotation. In contrast to detailed gene-specific functional information, transcriptomics data are widely available for both model and crop species. Here, we describe a novel automated function prediction method, which leverages complementary information from multiple expression datasets by analyzing study-specific gene co-expression networks. First, we benchmarked the prediction performance on recently characterized Arabidopsis thaliana genes, and showed that our method outperforms state-of-the-art expression-based approaches. Next, we predicted biological process annotations for known (n = 15 790) and unknown (n = 11 865) genes in A. thaliana and validated our predictions using experimental protein-DNA and protein-protein interaction data (covering >220 000 interactions in total), obtaining a set of high-confidence functional annotations. Our method assigned at least one validated annotation to 5054 (42.6%) unknown genes, and at least one novel validated function to 3408 (53.0%) genes with computational annotations only. These omics-supported functional annotations shed light on a variety of developmental processes and molecular responses, such as flower and root development, defense responses to fungi and bacteria, and phytohormone signaling, and help fill the information gap on biological process annotations in Arabidopsis. An in-depth analysis of two context-specific networks, modeling seed development and response to water deprivation, shows how previously uncharacterized genes function within the respective networks. Moreover, our automated function prediction approach can be applied in future studies to facilitate gene discovery for crop improvement.
Diatoms are a diverse group of mainly photosynthetic algae, responsible for 20% of worldwide oxygen production, which can rapidly respond to favourable conditions and often outcompete other phytoplankton. We investigated the contribution of horizontal gene transfer (HGT) to its ecological success. A large-scale phylogeny-based prokaryotic HGT detection procedure across nine sequenced diatoms showed that 3-5% of their proteome has a horizontal origin and a large influx occurred at the ancestor of diatoms. More than 90% of HGT genes are expressed, and species-specific HGT genes in Phaeodactylum tricornutum undergo strong purifying selection. Genes derived from HGT are implicated in several processes including environmental sensing, and expand the metabolic toolbox. Cobalamin (vitamin B12) is an essential cofactor for roughly half of the diatoms and is only produced by bacteria. Five consecutive genes involved in the final synthesis of the cobalamin biosynthetic pathway, which could function as scavenging and repair genes, were detected as HGT. The full suite of these genes were detected in the cold-adapted diatom Fragilariopsis cylindrus. This might give diatoms originating from the Southern Ocean, a region typically depleted in cobalamin, a competitive advantage. Overall, we show that HGT is a prevalent mechanism that is actively used in diatoms to expand its adaptive capabilities.
1.AbstractDiatoms are a diverse group of mainly photosynthetic algae, responsible for 20% of worldwide oxygen production, which can rapidly respond to favourable conditions and often outcompete other phytoplankton. We investigated the contribution of horizontal gene transfer (HGT) to its ecological success. A systematic phylogeny-based bacterial HGT detection procedure across nine sequenced diatoms showed that 3-5% of their proteome has a horizontal origin and a large influx occurred at the ancestor of diatoms. More than 90% of HGT genes are expressed, and species-specific HGT genes in Phaeodactylum tricornutum undergo strong purifying selection. They are implicated in several processes including environmental sensing, and expand the metabolic toolbox. Cobalamin (vitamin B12) is an essential cofactor for roughly half of the diatoms and is only produced by bacteria. Genes involved in its final synthesis were detected as HGT, including five consecutive enzymes in Fragilariopsis cylindrus. This might give diatoms originating from the Southern Ocean, a region typically depleted in cobalamin, a competitive advantage. Overall, we show that HGT is a prevalent mechanism that is actively used in diatoms to expand its adaptive capabilities.
Thousands of long intergenic noncoding RNAs (lincRNAs) have been identified in plant genomes. While some lincRNAs have been characterized as important regulators in different biological processes, little is known about the transcriptional regulation for most plant lincRNAs. Through the integration of eight annotation resources, we defined 6,599 high-confidence lincRNA loci in Arabidopsis (Arabidopsis thaliana). For lincRNAs belonging to different evolutionary age categories, we identified major differences in sequence and chromatin features, as well as in the level of conservation and purifying selection acting during evolution. Spatiotemporal gene expression profiles combined with transcription factor (TF) chromatin immunoprecipitation data were used to construct a TF-lincRNA regulatory network containing 2,659 lincRNAs and 15,686 interactions. We found that properties characterizing lincRNA expression, conservation and regulation differ between plants and animals. Experimental validation confirmed the role of three TFs, KANADI 1 (KAN1), MYB DOMAIN PROTEIN 44 (MYB44), and PHYTOCHROME INTERACTING FACTOR 4 (PIF4), as key regulators controlling root-specific lincRNA expression, demonstrating the predictive power of our network. Furthermore, we identified 58 lincRNAs, regulated by these TFs, showing strong root cell-type specific expression or chromatin accessibility, which are linked with GWAS genetic associations related to root system development and growth. The multi-level genome-wide characterization covering chromatin state information, promoter conservation, and ChIP-based TF binding, for all detectable lincRNAs across 769 expression samples, permits rapidly defining the biological context and relevance of Arabidopsis lincRNAs through regulatory networks.
Unraveling gene functions is pivotal to understand the signaling cascades controlling plant development and stress responses. Given that experimental profiling is costly and labor intensive, the need for high-confidence computational annotations is evident. In contrast to detailed gene-specific functional information, transcriptomics data is widely available in both model and crop species. Here, we developed a novel automated function prediction (AFP) algorithm, leveraging complementary information present in multiple expression datasets through the analysis of study-specific gene co-expression networks. Benchmarking the prediction performance on recently characterized Arabidopsis thaliana genes, we showed that our method outperforms state-of-the-art expression-based approaches. Next, we predicted biological process annotations for known (n=15,790) and unknown (n=11,865) genes in A. thaliana and validated our predictions using experimental protein-DNA and protein-protein interaction data (covering >220 thousand interactions in total), obtaining a set of high-confidence functional annotations. 5,054 (42.6%) unknown genes were assigned at least one validated annotation, and 3,408 (53.0%) genes with only computational annotations gained at least one novel validated function. These omics-supported functional annotations shed light on a variety of developmental processes and molecular responses, such as flower and root development, defense responses to fungi and bacteria, and phytohormone signaling, and help alleviate the knowledge gap of biological process annotations in Arabidopsis. An in-depth analysis of two context-specific networks, modeling seed development and response to water deprivation, shows how previously uncharacterized genes function within the respective networks. Moreover, our AFP approach can be applied in future studies to facilitate gene discovery for crop improvement.
The anaphase-promoting complex/cyclosome (APC/C) marks key cell cycle proteins for proteasomal breakdown, thereby ensuring unidirectional progression through the cell cycle. Its target recognition is temporally regulated by activating subunits, one of which is called CELL CYCLE SWITCH 52 A2 (CCS52A2). We sought to expand the knowledge on the APC/C by using the severe growth phenotypes of CCS52A2-deficient Arabidopsis (Arabidopsis thaliana) plants as a readout in a suppressor mutagenesis screen, resulting in the identification of the previously undescribed gene called PIKMIN1 (PKN1). PKN1 deficiency rescues the disorganized root stem cell phenotype of the ccs52a2-1 mutant, whereas an excess of PKN1 inhibits growth of ccs52a2-1 plants, indicating the need for control of PKN1 abundance for proper development. Accordingly, the lack of PKN1 in a wild-type background negatively impacts cell division, while its systemic overexpression promotes proliferation. PKN1 shows a cell cycle phase-dependent accumulation pattern, localizing to microtubular structures, including the preprophase band, the mitotic spindle, and phragmoplast. PKN1 is conserved throughout the plant kingdom, with its function in cell division being evolutionary conserved in the liverwort Marchantia polymorpha. Our data thus demonstrate that PKN1 represents a novel, plant-specific gene with a role in cell division that is likely proteolytically controlled by the CCS52A2-activated APC/C.
Thousands of long intergenic noncoding RNAs (lincRNAs) have been identified in plant genomes. While some lincRNAs have been characterized as important regulators in different biological processes, little is known about the transcriptional regulation for most plant lincRNAs. Through the integration of eight annotation resources, we defined 6,599 high-confidence lincRNA loci in Arabidopsis thaliana. For lincRNAs belonging to different evolutionary age categories, we identified major differences in sequence and chromatin features, as well as in the level of conservation and selection. Spatiotemporal gene expression profiles combined with transcription factor (TF) chromatin immunoprecipitation data was used to construct a TF-lincRNA regulatory network containing 2,659 lincRNAs and 43,233 interactions. We experimentally confirmed the role of three TFs, KAN1, MYB44, and PIF4, as key regulators controlling root-specific lincRNA expression, demonstrating the predictive power of our network. Furthermore, we identified 58 lincRNAs regulated by these TFs showing strong root cell-type specific expression and that are linked with GWAS genetic associations related to root system development and growth. The multi-omics approach applied in this study sheds light on the global regulatory complexity of plant lincRNA networks and pinpoints a role of specific TFs in lincRNA regulation in roots.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.