2021
DOI: 10.1093/bioinformatics/btab339
|View full text |Cite|
|
Sign up to set email alerts
|

ORFLine: a bioinformatic pipeline to prioritize small open reading frames identifies candidate secreted small proteins from lymphocytes

Abstract: Motivation The annotation of small open reading frames (smORFs) of less than 100 codons (<300 nucleotides) is challenging due to the large number of such sequences in the genome. Results In this study, we developed a computational pipeline, which we have named ORFLine, that stringently identifies smORFs and classifies them according to their position within transcripts. We identified a total of 5744 unique smORFs in da… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 44 publications
0
7
0
Order By: Relevance
“…The recent development of approaches combining ribosome profiling (Ingolia et al, 2009), mass spectrometry, and computational analysis (J. Chen, Brunner, et al, 2020) has enable the identification of thousands open reading frames (ORFs) in lymphocytes encoding new proteins and small peptides with yet unknown functions (Turner et al, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…The recent development of approaches combining ribosome profiling (Ingolia et al, 2009), mass spectrometry, and computational analysis (J. Chen, Brunner, et al, 2020) has enable the identification of thousands open reading frames (ORFs) in lymphocytes encoding new proteins and small peptides with yet unknown functions (Turner et al, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, various methods have been developed to explore smORFs [32,[46][47][48][49][50]. Construction of the smORF database in a specific organism typically begins with genome-wide scanning, resulting in a preliminary dataset.…”
Section: Discussionmentioning
confidence: 99%
“…Transcriptomic data, which encompass the entire set of RNA transcripts in a given cell or tissue, provide valuable information about gene expression patterns and can aid in the identification of ORFs. Transcriptomic data can be obtained through methods such as RNA sequencing (RNA-seq) and, still nowadays, microarrays (Kiniry, Michel and Baranov, 2020;Hu et al, 2021). The integration of transcriptomic data with genomic sequences helps to refine ORF predictions by identifying expressed regions and providing evidence for the existence of functional genes.…”
Section: Input Data For Prediction Toolsmentioning
confidence: 99%
“…GeneScan utilizes Fourier techniques to detect the three-base periodicity in genomic sequences, aiding in recognizing coding regions (Tiwari et al, 1997). ORFLine, a computational pipeline, is adept at identifying and classifying sORFs, providing insights into potential secreted proteins from lymphocytes (Hu et al, 2021). It should be noted that all these methods may require extensive training datasets and might not be as universally applicable as ab initio, homology-based, or hybrid methods.…”
mentioning
confidence: 99%