2020
DOI: 10.1101/2020.10.20.348052
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

orfipy: a fast and flexible tool for extracting ORFs

Abstract: Searching for ORFs in transcripts is a critical step prior to annotating coding regions in newly-sequenced genomes and to search for alternative reading frames within known genes. With the tremendous increase in RNA-Seq data, faster tools are needed to handle large input datasets. These tools should be versatile enough to fine-tune search criteria and allow efficient downstream analysis. Here we present a new python based tool, orfipy, which allows the user to flexibly search for open reading frames in fasta s… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(10 citation statements)
references
References 13 publications
0
10
0
Order By: Relevance
“…Long ORFs are often used, along with other evidence, to initially identify candidate protein-coding regions or functional RNA-coding regions in a given DNA sequence, but the presence of an ORF does not necessarily mean that the region is always translated [24]. As BLAST and BLAT, the web-based ORF Finder (https://www.ncbi.nlm.nih.gov/orffinder/), ORF Predictor (http://bioinformatics.ysu.edu/tools/OrfPredictor.html) and command-line tools (ORF Investigator [25] and orfipy [13]) offer a range of ORF searches, but its usage can be challenging for biologists due to lack of computer programming literacy and limited query sequence length. To maximise the flexibility, the easyfm ORF provides a fast and efficient approach for all possible translation and extraction of ORFs from nucleotide sequences (FASTA format of nucleotide and protein output from six-frame translation) (Fig 4).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Long ORFs are often used, along with other evidence, to initially identify candidate protein-coding regions or functional RNA-coding regions in a given DNA sequence, but the presence of an ORF does not necessarily mean that the region is always translated [24]. As BLAST and BLAT, the web-based ORF Finder (https://www.ncbi.nlm.nih.gov/orffinder/), ORF Predictor (http://bioinformatics.ysu.edu/tools/OrfPredictor.html) and command-line tools (ORF Investigator [25] and orfipy [13]) offer a range of ORF searches, but its usage can be challenging for biologists due to lack of computer programming literacy and limited query sequence length. To maximise the flexibility, the easyfm ORF provides a fast and efficient approach for all possible translation and extraction of ORFs from nucleotide sequences (FASTA format of nucleotide and protein output from six-frame translation) (Fig 4).…”
Section: Resultsmentioning
confidence: 99%
“…Conveniently, FASTQ files can also be converted to FASTA files, the most commonly used file format for NGS data that enables direct sequencing of target genes. Many available tools (easySEARCH [10]; BlasterJS [11]; Sequenceserver [12]; orfipy [13]); Samtools and BCFtools [14] including easyfm ) have not surprisingly focused on manipulating (analyse, collect, organise, interpret, and present data in meaningful ways) the FASTA file format to generate biologically relevant insights.…”
Section: Introductionmentioning
confidence: 99%
“…Protein sequence used as evidence in MAKER 30 were generated in one of two ways: 1) For Arabidopsis, yeast, and rice, RNA-Seq reads were assembled using Trinity (v2.6.6) 65 , followed by open reading frame (ORF) prediction and translation using orfipy 29 or TransDecoder (v3.0.1) 65 . 2) For Arabidopsis only, data was downloaded from Phytozome 66 as predicted protein sequences for nine species: Arabidopsis thaliana, (Glycine max, Populus trichocarpa, Arabidopsis lyrata, Conradina grandiflora, Setaria italica, Oryza sativa, Physcomitrella patens, Chlamydomonas reinhardtii, and Brassica rapa).…”
Section: Rna-seq Genome and Protein Input Datamentioning
confidence: 99%
“…The BAM file generated by mapping reads to the Araport11-annotated indexed genome using HiSat2 (v2.1.0) 67 was provided as training for the assemblers. The resultant assembled transcripts were used to predict ORFs using Transdecoder 65 or orfipy 29 . We selected those complete ORFs over 150 nt.…”
Section: Evidence-based Annotation Of Genes By Direct Inferencementioning
confidence: 99%
See 1 more Smart Citation