2020
DOI: 10.1093/bib/bbaa045
|View full text |Cite
|
Sign up to set email alerts
|

CodAn: predictive models for precise identification of coding regions in eukaryotic transcripts

Abstract: Motivation Characterization of the coding sequences (CDSs) is an essential step in transcriptome annotation. Incorrect identification of CDSs can lead to the prediction of non-existent proteins that can eventually compromise knowledge if databases are populated with similar incorrect predictions made in different genomes. Also, the correct identification of CDSs is important for the characterization of the untranslated regions (UTRs), which are known to be important regulators of the mRNA tra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(11 citation statements)
references
References 39 publications
(62 reference statements)
0
10
0
Order By: Relevance
“…A core element in the downstream analysis for RNA-seq data involves the translation of assembled sequences into their corresponding amino acid sequences, and on the nucleotide level into the protein coding sequences (CDS) not containing any untranslated regions (UTRs). A correct characterization of CDS is not only important for profiling the protein-coding fraction of a transcriptome, but also for an accurate classification of UTRs and non-coding sequences/regions which may be of interest in the context of gene regulation [ 146 ].…”
Section: Sequence Translationmentioning
confidence: 99%
“…A core element in the downstream analysis for RNA-seq data involves the translation of assembled sequences into their corresponding amino acid sequences, and on the nucleotide level into the protein coding sequences (CDS) not containing any untranslated regions (UTRs). A correct characterization of CDS is not only important for profiling the protein-coding fraction of a transcriptome, but also for an accurate classification of UTRs and non-coding sequences/regions which may be of interest in the context of gene regulation [ 146 ].…”
Section: Sequence Translationmentioning
confidence: 99%
“…In addition, we searched for putative regulatory elements that may be acting in the transcriptional and post-transcriptional levels of the transcripts. First, to detect the transcriptional factors (TFs) GRHL1 and NFI in the transcriptome assembly, we performed a coding sequence prediction using CodAn (v1.0; [ 93 ]) with the vertebrate full model. Next, we performed a BLAST search using a database containing the GRHL1 and NFI peptide sequences available from Swiss-Prot, Ensembl, and previously published snake transcriptome assemblies from the TSA database [ 9 , 87 , 94 , 95 ].…”
Section: Methodsmentioning
confidence: 99%
“…Certainly, characterization of coding potential has its own significance for genome annotation, so as to partition different functional regions on the genomes. Prodigal [60] , TransDecoder [48] , GeneMarkS-T [132] and CodAn [101] are such approaches that were developed for precise identification of coding regions in transcirpts, these methods have an important referential value for lncRNA identification. For example, using these tools, we can further determine the ORF-related features which were usually as a vital parameter during lncRNA identification.…”
Section: Survey Of the Current In-silico Tools Of mentioning
confidence: 99%