Accurate prediction of open reading frames (ORFs) is important for studying and using genome sequences. Ribosomes move along mRNA strands with a step of three nucleotides and datasets carrying this information can be used to predict ORFs. The ribosome-protected footprints (RPFs) feature a significant 3-nt periodicity on mRNAs and are powerful in predicting translating ORFs, including small ORFs (sORFs), but the application of RPFs is limited because they are too short to be accurately mapped in complex genomes. In this study, we found a significant 3-nt periodicity in the datasets of populational genomic variants in coding sequences, in which the nucleotide diversity increases every three nucleotides. We suggest that this feature can be used to predict ORFs and develop the Python package ‘OrfPP’, which recovers ~83% of the annotated ORFs in the tested genomes on average, independent of the population sizes and the complexity of the genomes. The novel ORFs, including sORFs, identified from single-nucleotide polymorphisms are supported by protein mass spectrometry evidence comparable to that of the annotated ORFs. The application of OrfPP to tetraploid cotton and hexaploid wheat genomes successfully identified 76.17% and 87.43% of the annotated ORFs in the genomes, respectively, as well as 4704 sORFs, including 1182 upstream and 2110 downstream ORFs in cotton and 5025 sORFs, including 232 upstream and 234 downstream ORFs in wheat. Overall, we propose an alternative and supplementary approach for ORF prediction that can extend the studies of sORFs to more complex genomes.
Ribo-seq, also known as ribosome profiling, refers to the sequencing of ribosome-protected mRNA fragments (RPFs). This technique has greatly advanced our understanding of translation and facilitated the identification of novel open reading frames (ORFs) within untranslated regions or non-coding sequences as well as the identification of non-canonical start codons. However, the widespread application of Ribo-seq has been hindered because obtaining periodic RPFs requires a highly optimized protocol, which may be difficult to achieve, particularly in non-model organisms. Furthermore, the periodic RPFs are too short (28 nt) for accurate mapping to polyploid genomes, but longer RPFs are usually produced with a compromise in periodicity. Here we present RiboNT, a noise-tolerant ORF predictor that can utilize RPFs with poor periodicity. It evaluates RPF periodicity and automatically weighs the support from RPFs and codon usage before combining their contributions to identify translated ORFs. The results demonstrate the utility of RiboNT for identifying both long and small ORFs using RPFs with either good or poor periodicity. We implemented the pipeline on a dataset of RPFs with poor periodicity derived from membrane-bound polysomes of Arabidopsis thaliana seedlings and identified several small ORFs (sORFs) evolutionarily conserved in diverse plant species. RiboNT should greatly broaden the application of Ribo-seq by minimizing the requirement of RPF quality and allowing the use of longer RPFs, which is critical for organisms with complex genomes because these RPFs can be more accurately mapped to the position from which they were derived.
The roles of short/small open reading frames (sORFs) have been increasingly recognized in recent years due to the rapidly growing number of sORFs identified in various organisms due to the development and application of the Ribo-Seq technique, which sequences the ribosome-protected footprints (RPFs) of the translating mRNAs. However, special attention should be paid to RPFs used to identify sORFs in plants due to their small size (~30 nt) and the high complexity and repetitiveness of the plant genome, particularly for polyploidy species. In this work, we compare different approaches to the identification of plant sORFs, discuss the advantages and disadvantages of each method, and provide a guide for choosing different methods in plant sORF studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.