Abstract:RNA G-quadruplex (rG4) is a vital RNA tertiary structure motif that involves the base pairs on both Hoogsteen and Watson-Crick faces of guanines. rG4 is of great importance in the post-transcriptional regulation of gene expression. Experimental technologies have advanced to identify in vitro and in vivo rG4s across diverse transcriptomes. Building on these recent advances, here we present G4Atlas, the first transcriptome-wide G-quadruplex database, in which we have collated, classified, and visualized transcri… Show more
“…We first obtained all rG4 motifs in the 5’ UTRs from our G4Atlas database 33 . Subsequently, we identified all rG4 motifs associated with translation using our model’s attention contrast matrix across the transcriptome ( Methods ).…”
Section: Resultsmentioning
confidence: 99%
“…We obtained all potential rG4 in rice from our G4Atlas database 33 . Next, we aligned the rG4 sequences with the corresponding attention contrast matrix and employed the paired t -test to assess the statistical significance.…”
Section: Methodsmentioning
confidence: 99%
“…RNA G-quadruplexes (rG4s) are one of the RNA tertiary structure motifs formed by the stacking of two or more G-quartets, composed of four guanines held together by both Watson-Crick and Hoogsteen hydrogen bonds 8,32,33 . Previous studies have demonstrated the important role of individual rG4s in repressing translation 34 .…”
Section: Plantrna-fm Globally Identifies the Translation-associated R...mentioning
The complex ‘language’ of plant RNA encodes a vast array of biological regulatory elements that orchestrate crucial aspects of plant growth, development, and adaptation to environmental stresses. Recent advancements in foundation models (FMs) have demonstrated their unprecedented potential to decipher complex ‘language’ in biology. In this study, we introduced PlantRNA-FM, a novel high-performance and interpretable RNA FM specifically designed based on RNA features including both sequence and structure. PlantRNA-FM was pre-trained on an extensive dataset, integrating RNA sequences and RNA structure information from 1,124 distinct plant species. PlantRNA-FM exhibits superior performance in plant-specific downstream tasks, such as plant RNA annotation prediction and RNA translation efficiency (TE) prediction. Compared to the second-best FMs, PlantRNA-FM achieved anF1 score improvement of up to 52.45% in RNA genic region annotation prediction and up to 15.30% in translation efficiency prediction, respectively. Our PlantRNA-FM is empowered by our interpretable framework that facilitates the identification of biologically functional RNA sequence and structure motifs, including both RNA secondary and tertiary structure motifs across transcriptomes. Through experimental validations, we revealed novel translation-associated RNA motifs in plants. Our PlantRNA-FM also highlighted the importance of the position information of these functional RNA motifs in genic regions. Taken together, our PlantRNA-FM facilitates the exploration of functional RNA motifs across the complexity of transcriptomes, empowering plant scientists with novel capabilities for programming RNA codes in plants.
“…We first obtained all rG4 motifs in the 5’ UTRs from our G4Atlas database 33 . Subsequently, we identified all rG4 motifs associated with translation using our model’s attention contrast matrix across the transcriptome ( Methods ).…”
Section: Resultsmentioning
confidence: 99%
“…We obtained all potential rG4 in rice from our G4Atlas database 33 . Next, we aligned the rG4 sequences with the corresponding attention contrast matrix and employed the paired t -test to assess the statistical significance.…”
Section: Methodsmentioning
confidence: 99%
“…RNA G-quadruplexes (rG4s) are one of the RNA tertiary structure motifs formed by the stacking of two or more G-quartets, composed of four guanines held together by both Watson-Crick and Hoogsteen hydrogen bonds 8,32,33 . Previous studies have demonstrated the important role of individual rG4s in repressing translation 34 .…”
Section: Plantrna-fm Globally Identifies the Translation-associated R...mentioning
The complex ‘language’ of plant RNA encodes a vast array of biological regulatory elements that orchestrate crucial aspects of plant growth, development, and adaptation to environmental stresses. Recent advancements in foundation models (FMs) have demonstrated their unprecedented potential to decipher complex ‘language’ in biology. In this study, we introduced PlantRNA-FM, a novel high-performance and interpretable RNA FM specifically designed based on RNA features including both sequence and structure. PlantRNA-FM was pre-trained on an extensive dataset, integrating RNA sequences and RNA structure information from 1,124 distinct plant species. PlantRNA-FM exhibits superior performance in plant-specific downstream tasks, such as plant RNA annotation prediction and RNA translation efficiency (TE) prediction. Compared to the second-best FMs, PlantRNA-FM achieved anF1 score improvement of up to 52.45% in RNA genic region annotation prediction and up to 15.30% in translation efficiency prediction, respectively. Our PlantRNA-FM is empowered by our interpretable framework that facilitates the identification of biologically functional RNA sequence and structure motifs, including both RNA secondary and tertiary structure motifs across transcriptomes. Through experimental validations, we revealed novel translation-associated RNA motifs in plants. Our PlantRNA-FM also highlighted the importance of the position information of these functional RNA motifs in genic regions. Taken together, our PlantRNA-FM facilitates the exploration of functional RNA motifs across the complexity of transcriptomes, empowering plant scientists with novel capabilities for programming RNA codes in plants.
“…A trio of new nucleic acid quadruplex-related databases also feature. G4Atlas ( 5 ) focuses on experimentally determined RNA G-quadruplexes (rG4s) across transcriptomes, determined by a variety of experimental methods, and accompanied by their classification into canonical and other types. QUADRAtlas ( 6 ) similarly focuses on rG4s, covering both experimental and predicted structures and including information on rG4-binding proteins, while GAIA ( 7 ) surveys predicted quadruplexes in both genomes and transcriptomes across all three kingdoms.…”
The 2023 Nucleic Acids Research Database Issue contains 178 papers ranging across biology and related fields. There are 90 papers reporting on new databases and 82 updates from resources previously published in the Issue. Six more papers are updates from databases most recently published elsewhere. Major nucleic acid databases reporting updates include Genbank, ENA, ChIPBase, JASPAR, mirDIP and the Issue's first Breakthrough Article, NACDDB for Circular Dichroism data. Updates from BMRB and RCSB cover experimental protein structural data while AlphaFold 2 computational structure predictions feature widely. STRING and REBASE are stand-out updates in the signalling and enzymes section. Immunology-related databases include CEDAR, the second Breakthrough Article, for cancer epitopes and receptors alongside returning IPD-IMGT/HLA and the new PGG.MHC. Genomics-related resources include Ensembl, GWAS Central and UCSC Genome Browser. Major returning databases for drugs and their targets include Open Targets, DrugCentral, CTD and Pubchem. The EMPIAR image archive appears in the Issue for the first time. The entire database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been updated, revisiting 463 entries, adding 92 new resources and eliminating 96 discontinued URLs so bringing the current total to 1764 databases. It is available at http://www.oxfordjournals.org/nar/database/c/.
“…Besides the right-handed double helical structure [1,2], nucleic acids can fold into a variety of secondary structures like hairpin, triplex, quadruplex, i-motif, etc [3][4][5][6][7][8][9]. Multiple pieces of evidence unfold the instrumental role of quadruplex in regulating the biological processes across all the domains of life [10][11][12][13][14][15][16][17][18][19][20][21][22]. This class of thermodynamically stable alternative structure encompasses four guanine-rich strands [23] (thus the name quadruplex or tetraplex), stabilized by planar G-tetrads (Figure 1A) that stack on each other (Figure 1B, 1C).…”
DNA quadruplexes take part in many biological functions. It takes up a variety of folds based on the sequence and environment. Here, a meticulous analysis of experimentally determined 392 quadruplex structures (388 PDB IDs) deposited in PDB is carried out. The analysis reveals the modular representation of the quadruplex folds. 48 unique quadruplex motifs (whose diversity arises out of the propeller, bulge, diagonal, and lateral loops that connect the quartets) are identified, leading to simple to complex inter-/intra-molecular quadruplex folds. These structural two-layered motifs are further classified into 33 continuous and 15 discontinuous motifs. The discontinuous motifs cannot further be classified into parallel, antiparallel, or hybrid as one or more guanines of the adjacent quartets are not connected. While the continuous motifs can be extended to a quadruplex fold, the discontinuous motif requires additional loop(s) to complete a fold, as illustrated here with examples. Similarly, the higher-order quadruplex folds can also be represented by continuous or discontinuous motifs or their combinations. Such a modular representation of the quadruplex folds may assist in custom engineering of quadruplexes, designing motif-based drugs, and the prediction of quadruplex structure. Further, it could facilitate understanding the role of quadruplexes in biological functions and diseases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.