2013
DOI: 10.1093/nar/gkt646
|View full text |Cite
|
Sign up to set email alerts
|

Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts

Abstract: It is a challenge to classify protein-coding or non-coding transcripts, especially those re-constructed from high-throughput sequencing data of poorly annotated species. This study developed and evaluated a powerful signature tool, Coding-Non-Coding Index (CNCI), by profiling adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. CNCI is effective for classifying incomplete transcripts and sense–antisense pairs. The implementation of C… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
1,228
2
1

Year Published

2013
2013
2024
2024

Publication Types

Select...
7
2
1

Relationship

1
9

Authors

Journals

citations
Cited by 1,577 publications
(1,233 citation statements)
references
References 25 publications
2
1,228
2
1
Order By: Relevance
“…It was identifi ed to be an lncRNA rather than a protein-coding transcript by CNCI software ( 18 ). We performed a codon substitution frequency analysis using PhyloCSF ( 19 ).…”
Section: Research Articlementioning
confidence: 99%
“…It was identifi ed to be an lncRNA rather than a protein-coding transcript by CNCI software ( 18 ). We performed a codon substitution frequency analysis using PhyloCSF ( 19 ).…”
Section: Research Articlementioning
confidence: 99%
“…Different from linear discriminant functions, non-liner kernels have complex discriminant functions for complicated data examples. Usually, classical non-linear kernels designed for particular applications, including polynomial kernels [76], Gaussian kernels [79,80], spectrum kernels [81], weighted degree (WD) kernels [74], WD kernels with shifts [82], string kernels [83,84], Oligo kernels [85], convolutional kernels [86], and so forth, can be used for modeling more complex decision boundaries in predicting various signal sensors [72,74,87].…”
Section: Support Vector Machines and Kernel Methodsmentioning
confidence: 99%
“…Accumulating evidence showed that the protein-coding genes are accounted for only 50% of inal assembled transcriptome data. Mining inal non-redundant transcriptome data via long non-coding RNA identiication tools such as PLEK [90], lncRScan-SVM [91], FEELnc [92] or measuring protein coding potential of transcripts using various tools such as coding potential calculator (CPC) [93], coding potential assessment tool (CPAT) [94], coding-non-coding index (CNCI) [95] provides us more information about the transcriptome landscape of non-model organism.…”
Section: Transcriptomics Tells More: Focusing On Speciic Annotation Tmentioning
confidence: 99%