Small open reading frames (small ORFs/sORFs/smORFs) are potentially coding sequences smaller than 100 codons that have historically been considered junk DNA by gene prediction software and in annotation screening; however, the advent of next-generation sequencing has contributed to the deeper investigation of junk DNA regions and their transcription products, resulting in the emergence of smORFs as a new focus of interest in systems biology. Several smORF peptides were recently reported in noncanonical mRNAs as new players in numerous biological contexts; however, their relevance is still overlooked in coding potential analysis. Hence, this review proposes a smORF classification based on transcriptional features, discussing the most promising approaches to investigate smORFs based on their different characteristics. First, smORFs were divided into nonexpressed (intergenic) and expressed (genic) smORFs. Second, genic smORFs were classified as smORFs located in noncoding RNAs (ncRNAs) or canonical mRNAs. Finally, smORFs in ncRNAs were further subdivided into sequences located in small or long RNAs, whereas smORFs located in canonical mRNAs were subdivided into several specific classes depending on their localization along the gene. We hope that this review provides new insights into large-scale annotations and reinforces the role of smORFs as essential components of a hidden coding DNA world.
Genes encoding small open-reading frames (smORFs) have been characterized as essential players of developmental processes. The smORF tarsaless/millepattes/polished-rice has been thoroughly investigated in holometabolous insects, such as the fruit fly Drosophila melanogaster and the red flour beetle Tribolium castaneum, while its function in hemimetabolous insects remains unknown. Thus, we analyzed the function of the tal/pri/mlpt ortholog in a hemimetabolous insect, the kissing bug Rhodnius prolixus (Rp). First, sequence analysis shows that Rp-tal/pri/mlpt polycistronic mRNA encodes two small peptides (11 to 14 amino acids) containing a LDPTG motif. Interestingly, a new hemipteran-specific conserved peptide of approximately 80 amino acids was also identified by in silico analysis. In silico docking analysis supports the high-affinity binding of the small LDPTG peptides to the transcription factor Shavenbaby. Rptal/pri/mlpt in situ hybridization and knockdown via RNA interference showed a conserved role of Rp-tal/pri/mlpt during embryogenesis, with a major role in the regulation of thoracic versus abdominal segmentation, leg development and head formation. Altogether, our study shows that tal/pri/mlpt segmentation role is conserved in the common ancestor of Paraneoptera and suggests that polycistronic genes might generate order specific smORFs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.