2016
DOI: 10.1371/journal.pone.0153268
|View full text |Cite
|
Sign up to set email alerts
|

Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features

Abstract: BackgroundPiwi-interacting RNA (piRNA) is the largest class of small non-coding RNA molecules. The transposon-derived piRNA prediction can enrich the research contents of small ncRNAs as well as help to further understand generation mechanism of gamete.MethodsIn this paper, we attempt to differentiate transposon-derived piRNAs from non-piRNAs based on their sequential and physicochemical features by using machine learning methods. We explore six sequence-derived features, i.e. spectrum profile, mismatch profil… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
53
0
3

Year Published

2017
2017
2021
2021

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 55 publications
(56 citation statements)
references
References 44 publications
0
53
0
3
Order By: Relevance
“…We found that simple tri-nucleotide composition feature can efficiently predict sRNAs. Literature search identified six sequence-derived features, such as spectrum profile, mismatch profile, subsequence profile, position-specific matrix, pseudo dinucleotide composition and local structure-sequence triplet elements, which can predict piwi-interacting RNAs (piRNA)4243. We will use the above features in our future study to further improve the prediction model.…”
Section: Discussionmentioning
confidence: 99%
“…We found that simple tri-nucleotide composition feature can efficiently predict sRNAs. Literature search identified six sequence-derived features, such as spectrum profile, mismatch profile, subsequence profile, position-specific matrix, pseudo dinucleotide composition and local structure-sequence triplet elements, which can predict piwi-interacting RNAs (piRNA)4243. We will use the above features in our future study to further improve the prediction model.…”
Section: Discussionmentioning
confidence: 99%
“…Nucleic acid composition 1) increment of diversity (IDKmer) [226,270,291] 2) The occurrences of kmers, allowing at most m mismatches (Mismatch) [264,265,292] 3) The occurrences of kmers, allowing non-contiguous matches (Subsequence) [265,292,293] Autocorrelation 4) Moran autocorrelation (MAC) [268,294] 5) Geary autocorrelation (GAC) [217,295] 6) Normalized Moreau-Broto autocorrelation (NMBAC) [217,296] Table 2. List of the 8 new modes for RNA sequences.…”
Section: Category Modementioning
confidence: 99%
“…1) The occurrences of kmers, allowing at most m mismatches (Mismatch) [264,265,292] 2) The occurrences of kmers, allowing non-contiguous matches (Subsequence) [265,292,293] Autocorrelation 3) Moran autocorrelation (MAC) [217,294] 4) Geary autocorrelation (GAC) [217,295] 5) Normalized Moreau-Broto autocorrelation (NMBAC) [217,296] Predicted structure composition 6) Local structure-sequence triplet element (Triplet) [266] 7) Pseudo-structure status composition (PseSSC) [226] 8) Pseudo-distance structure status pair composition (PseDPC) [10] 2) PseAAC of Distance-Pairs and Reduced Alphabet (Distance Pair) [271] Autocorrelation 3) Physicochemical distance transformation (PDT) [270] Profile-based features 4) Select and combine the n most frequenct amino acids according to their frequencies (Top-n-gram) [269] 5) Profile-based Physicochemical distance transformation (PDT-Pofile) [270] 6) Distance-based Top-n-gram (DT) [271] 7) Profile-based Auto covariance (AC-PSSM) [272] 8) Profile-based Cross covariance (CC-PSSM) [272] 9) Profile-based Auto-cross covariance (ACC-PSSM) [272] Natural Science Mismatch [264] and Subsequence [265]; and 3 are added into the autocorrelation category, i.e., Moran autocorrelation, Geary autocorrelation, and Normalized Moreau-Broto autocorrelation [268]. PseAAC-General is designed to generate the feature vectors for protein sequences.…”
Section: Category Modementioning
confidence: 99%
“…Essa moléculaé essencial na síntese de proteínas, poisé capaz de expressar as informações presentes no DNA. De acordo com [Luo et al 2016], os RNAs não codificantes (ncR-NAs) são importantes moléculas de RNA funcionais, que não são traduzidas em proteínas [Claverie 2005, Mattick 2005]. RNAs não-codificantes são classificados como ncRNAs longos e ncRNAs curtos, de acordo com seus comprimentos.…”
Section: Introductionunclassified