Background Intron mediated enhancement (IME) is the potential of introns to enhance the expression of its respective gene. This essential function of introns has been observed in a wide range of species, including fungi, plants, and animals. However, the mechanisms underlying the enhancement are as of yet poorly understood. The goal of this study was to identify potential IME-related sequence motifs and genomic features in first introns of genes in Arabidopsis thaliana. Results Based on the rationale that functional sequence motifs are evolutionarily conserved, we exploited the deep sequencing information available for Arabidopsis thaliana, covering more than one thousand Arabidopsis accessions, and identified 81 candidate hexamer motifs with increased conservation across all accessions that also exhibit positional occurrence preferences. Of those, 71 were found associated with increased correlation of gene expression of genes harboring them, suggesting a cis-regulatory role. Filtering further for effect on gene expression correlation yielded a set of 16 hexamer motifs, corresponding to five consensus motifs. While all five motifs represent new motif definitions, two are similar to the two previously reported IME-motifs, whereas three are altogether novel. Both consensus and hexamer motifs were found associated with higher expression of alleles harboring them as compared to alleles containing mutated motif variants as found in naturally occurring Arabidopsis accessions. To identify additional IME-related genomic features, Random Forest models were trained for the classification of gene expression level based on an array of sequence-related features. The results indicate that introns contain information with regard to gene expression level and suggest sequence-compositional features as most informative, while position-related features, thought to be of central importance before, were found with lower than expected relevance. Conclusions Exploiting deep sequencing and broad gene expression information and on a genome-wide scale, this study confirmed the regulatory role on first-introns, characterized their intra-species conservation, and identified a set of novel sequence motifs located in first introns of genes in the genome of the plant Arabidopsis thaliana that may play a role in inducing high and correlated gene expression of the genes harboring them.
Mechanical properties of DNA have been implied to influence many its biological functions. Recently, a new high-throughput method, called loop-seq, that allows measuring the intrinsic bendability of DNA fragments, has been developed. Using loop-seq data, we created a deep learning model to explore the biological significance of local DNA flexibility in a range of different species from different kingdoms. Consistently, we observed a characteristic and largely nucleotide-composition-driven change of local flexibility near transcription start sites. No evidence of a generally present region of lowered flexibility upstream of transcription start sites to facilitate transcription factor binding was found. Yet, depending on the actual transcription factor investigated, flanking-sequence-dependent DNA flexibility was identified as a potential factor influencing binding. Compared to randomized genomic sequences, depending on species and taxa, actual genomic sequences were observed both with increased and lowered flexibility. Furthermore, in Arabidopsis thaliana, crossing-over and mutation rates, both de novo and fixed, were found to be linked to rigid sequence regions. Our study presents a range of significant correlations between characteristic DNA mechanical properties and genomic features, the significance of which with regard to detailed molecular relevance awaits further experimental and theoretical exploration.
Background. Intron mediated enhancement (IME) is the potential of introns to enhance expression of its respective gene. This essential function of introns has been observed in a wide range of species, including fungi, plants, and animals. Studies in the plant Arabidopsis thaliana have shown that enhancing introns exhibit a distinct base composition and are generally the first intron located close to the transcription start site. However, the mechanisms underlying the enhancement are as of yet poorly understood. The goal of the study was to identify potential IME-related sequence motifs and genomic features found in first introns of genes in the plant Arabidopsis thaliana. Results. Based on the rationale that functionale sequence motifs are evolutionarily conserved, we exploited the deep sequencing information available for Arabidopsis thaliana, covering more than one thousand Arabidopsis accessions, and identified 81 candidate hexamer motifs with increased conservation across all accessions, and which also exhibited positional occurrence preferences. Of those, 71 were found associated with increased correlation of gene expression of genes harboring them, suggesting a cis-regulatory role. Filtering further for effect on gene expression correlation yielded a set of 16 hexamer motifs, corresponding to five consensus motifs. While all five motifs represent new motif definitions, two are similar to the two previously reported IME-motifs, whereas three are altogether novel. To identify additional IME-related genomic features, Random Forest models were trained for classification of gene expression level based on an array of different sequence-related features. The results indicate that introns harbor information with regard to gene expression level and suggest sequence-compositional features as most informative, while position-related features, that were thought to be of central importance before, were found with lower than expected relevance. Conclusions. Exploiting deep sequencing and broad gene expression information and on a genome-wide scale, this study confirmed the regulatory role on first-introns, characterized their intra-species conservation, and identified a set of novel sequence motifs located in first introns of genes in the genome of the plant Arabidopsis thalian a that may play a role in inducing high and correlated gene expression of the genes harboring them.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.