22Plant endogenous small RNAs (sRNAs) are important regulators of gene expression. 23There are two broad categories of plant sRNAs: microRNAs (miRNAs) and endogenous short 24interfering RNAs (siRNAs). MicroRNA loci are relatively well-annotated but comprise only a 25 small minority of the total sRNA pool; siRNA locus annotations have lagged far behind. Here, we 26 used a large dataset of published and newly generated sRNA sequencing data (1,333 sRNA-seq 27 libraries containing over 20 billion reads) and a uniform bioinformatic pipeline to produce 28 comprehensive sRNA locus annotations of 47 diverse plants, yielding over 2.7 million sRNA loci. 29The two most numerous classes of siRNA loci produced mainly 24 nucleotide and 21 nucleotide 30 siRNAs, respectively. 24 nucleotide-dominated siRNA loci usually occurred in intergenic regions, 31 especially at the 5'-flanking regions of protein-coding genes. In contrast, 21 nucleotide-32 dominated siRNA loci were most often derived from double-stranded RNA precursors copied 33 from spliced mRNAs. Genic 21 nucleotide-dominated loci were especially common from disease 34 resistance genes, including from a large number of monocots. Individual siRNA sequences of all 35 types showed very little conservation across species, while mature miRNAs were more likely to 36 be conserved. We developed a web server where our data and several search and analysis tools 37 are freely accessible at
Plant endogenous small RNAs (sRNAs) are important regulators of gene expression. There are two broad categories of plant sRNAs: microRNAs (miRNAs) and endogenous short interfering RNAs (siRNAs). MicroRNA loci are relatively well-annotated but compose only a small minority of the total sRNA pool; siRNA locus annotations have lagged far behind. Here, we used a large data set of published and newly generated sRNA sequencing data (1333 sRNA-seq libraries containing more than 20 billion reads) and a uniform bioinformatic pipeline to produce comprehensive sRNA locus annotations of 47 diverse plants, yielding more than 2.7 million sRNA loci. The two most numerous classes of siRNA loci produced mainly 24-and 21-nucleotide (nt) siRNAs, respectively. Most often, 24-nt-dominated siRNA loci occurred in intergenic regions, especially at the 5 ′-flanking regions of protein-coding genes. In contrast, 21-nt-dominated siRNA loci were most often derived from double-stranded RNA precursors copied from spliced mRNAs. Genic 21-nt-dominated loci were especially common from disease resistance genes, including from a large number of monocots. Individual siRNA sequences of all types showed very little conservation across species, whereas mature miRNAs were more likely to be conserved. We developed a web server where our data and several search and analysis tools are freely accessible.
Summary Cassava (Manihot esculenta Crantz, 2n = 36) is a global food security crop. It has a highly heterozygous genome, high genetic load, and genotype‐dependent asynchronous flowering. It is typically propagated by stem cuttings and any genetic variation between haplotypes, including large structural variations, is preserved by such clonal propagation. Traditional genome assembly approaches generate a collapsed haplotype representation of the genome. In highly heterozygous plants, this results in artifacts and an oversimplification of heterozygous regions. We used a combination of Pacific Biosciences (PacBio), Illumina, and Hi‐C to resolve each haplotype of the genome of a farmer‐preferred cassava line, TME7 (Oko‐iyawo). PacBio reads were assembled using the FALCON suite. Phase switch errors were corrected using FALCON‐Phase and Hi‐C read data. The ultralong‐range information from Hi‐C sequencing was also used for scaffolding. Comparison of the two phases revealed >5000 large haplotype‐specific structural variants affecting over 8 Mb, including insertions and deletions spanning thousands of base pairs. The potential of these variants to affect allele‐specific expression was further explored. RNA‐sequencing data from 11 different tissue types were mapped against the scaffolded haploid assembly and gene expression data are incorporated into our existing easy‐to‐use web‐based interface to facilitate use by the broader plant science community. These two assemblies provide an excellent means to study the effects of heterozygosity, haplotype‐specific structural variation, gene hemizygosity, and allele‐specific gene expression contributing to important agricultural traits and further our understanding of the genetics and domestication of cassava.
MicroRNAs, which target mRNAs for post-transcriptional regulation, and heterochromatic siRNAs, which target chromatin causing DNA methylation, make up the majority of the endogenous regulatory small RNA pool in most plant specimens. They both function to guide Argonaute proteins to targeted nucleic acids on the basis of complementarity. Recent work on plant miRNA-target interactions has clarified the general ‘rules’ of complementarity, while also providing several intriguing exceptions to these rules. In addition, emerging evidence suggests that several factors besides miRNA-target complementarity affect plant miRNA function. For heterochromatic siRNAs, recent work has made progress towards comprehensively identifying potential target regions, but numerous fundamental questions remain to be answered.
Plant small RNAs (sRNAs) modulate key physiological mechanisms through post-transcriptional and transcriptional silencing of gene expression. Small RNAs fall into two major categories: those are reliant on RNA-dependent RNA polymerases (RDRs) for biogenesis and those that are not. Known RDR1/2/6-dependent sRNAs include phased and repeat-associated short interfering RNAs, while known RDR1/2/6-independent sRNAs are primarily microRNAs (miRNA) and other hairpin-derived sRNAs. In this study we produced and analyzed sRNA-seq libraries from rdr1/rdr2/rdr6 triple mutant plants. We found 58 previously annotated miRNA loci that were reliant on RDR1, -2, or -6 function, casting doubt on their classification. We also found 38 RDR1/2/6-independent sRNA loci that are not MIRNAs or otherwise hairpin-derived, and did not fit into other known paradigms for sRNA biogenesis. These 38 sRNA-producing loci have as-yet-undescribed biogenesis mechanisms, and are frequently located in the vicinity of protein-coding genes. Altogether, our analysis suggests that these 38 loci represent one or more undescribed types of sRNA in Arabidopsis thaliana.
Cassava (Manihot esculenta Crantz, 2n=36) is a global food security crop. Cassava has a highly heterozygous genome, high genetic load, and genotype-dependent asynchronous flowering. It is typically propagated by stem cuttings and any genetic variation between haplotypes, including large structural variations, is preserved by such clonal propagation. Traditional genome assembly approaches generate a collapsed haplotype representation of the genome. In highly heterozygous plants, this results in artifacts and an oversimplification of heterozygous regions. We used a combination of Pacific Biosciences (PacBio), Illumina, and Hi-C to resolve each haplotype of the genome of a farmer-preferred cassava line, TME7 (Oko-iyawo). PacBio reads were assembled using the FALCON suite. Phase switch errors were corrected using FALCON-Phase and Hi-C read data. The ultra-long-range information from Hi-C sequencing was also used for scaffolding. Comparison of the two phases revealed more than 5,000 large haplotype-specific structural variants affecting over 8 Mb, including insertions and deletions spanning thousands of base pairs. The potential of these variants to affect allele specific expression was further explored. RNA-seq data from 11 different tissue types were mapped against the scaffolded haploid assembly and gene expression data are incorporated into our existing easy-to-use web-based interface to facilitate use by the broader plant science community. These two assemblies provide an excellent means to study the effects of heterozygosity, haplotype-specific structural variation, gene hemizygosity, and allele specific gene expression contributing to important agricultural traits and further our understanding of the genetics and domestication of cassava.
SummaryPlant small RNAs regulate key physiological mechanisms through post-transcriptional and transcriptional silencing of gene expression. sRNAs fall into two major categories: those that are reliant on RNA Dependent RNA Polymerases (RDRs) for biogenesis and those that aren't. Known RDR-dependent sRNAs include phased and repeat-associated short interfering RNAs, while known RDR-independent sRNAs are primarily microRNAs and other hairpinderived sRNAs. In this study, we produced and analyzed small RNA-seq libraries from rdr1/rdr2/rdr6 triple mutant plants. Only a small fraction of all sRNA loci were RDR1/RDR2/RDR6-independent; most of these were microRNA loci or associated with 22, 2017; 2 predicted hairpin precursors. We found 58 previously annotated microRNA loci that were reliant on RDR1, -2, or -6 function, casting doubt on their classification. We also found 38 RDR1/2/6-independent small RNA loci that are not MIRNAs or otherwise hairpin-derived, and did not fit into other known paradigms for small RNA biogenesis. These 38 small RNA-producing loci have novel biogenesis mechanisms, and are frequently located in the vicinity of protein-coding genes. Altogether, our analysis suggest that these 38 loci represent one or more new types of small RNAs in Arabidopsis thaliana.International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/238691 doi: bioRxiv preprint first posted online Dec.Significance Statement: Small RNAs regulate gene expression in plants and are produced through a variety of previously-described mechanisms. Here, we examine a set of previously undiscovered small RNA-producing loci that are produced by novel mechanisms.
Small RNAs regulate key physiological functions in land plants. Small RNAs can be divided into two categories: microRNAs (miRNAs) and short interfering RNAs (siR-NAs); siRNAs are further subdivided into transposon/repetitive region-localized heterochromatic siRNAs and phased siRNAs (phasiRNAs). PhasiRNAs are produced from the miRNA-mediated cleavage of a Pol II RNA transcript; the miRNA cleavage site provides a defined starting point from which phasiRNAs are produced in a distinctly phased pattern. 21-22 nucleotide (nt)-dominated phasiRNA-producing loci (PHAS) are well represented in all land plants to date. In contrast, 24 nt-dominated PHAS loci are known to be encoded only in monocots and are generally restricted to male reproductive tissues. Currently, only one miRNA (miR2275) is known to trigger the production of these 24 nt-dominated PHAS loci. In this study, we use stringent methodologies in order to examine whether or not 24 nt-dominated PHAS loci also exist in Arabidopsis thaliana. We find that highly expressed heterochromatic siRNAs were consistently misidentified as 24 nt-dominated PHAS loci using multiple PHASdetecting algorithms. We also find that MIR2275 is not found in A. thaliana, and it seems to have been lost in the last common ancestor of Brassicales. Altogether, our research highlights the potential issues with widely used PHAS-detecting algorithms which may lead to false positives when trying to annotate new PHAS, especially 24 nt-dominated loci. K E Y W O R D Sheterochromatic siRNA, MIR2275, phased siRNA, small RNA
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.