2011
DOI: 10.1093/bioinformatics/btr016
|View full text |Cite
|
Sign up to set email alerts
|

A k-mer scheme to predict piRNAs and characterize locust piRNAs

Abstract: Motivation: Identifying piwi-interacting RNAs (piRNAs) of non-model organisms is a difficult and unsolved problem because piRNAs lack conservative secondary structure motifs and sequence homology in different species.Results: In this article, a k-mer scheme is proposed to identify piRNA sequences, relying on the training sets from non-piRNA and piRNA sequences of five model species sequenced: rat, mouse, human, fruit fly and nematode. Compared with the existing ‘static’ scheme based on the position-specific ba… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
165
0
4

Year Published

2015
2015
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 121 publications
(173 citation statements)
references
References 30 publications
2
165
0
4
Order By: Relevance
“…Non-piRNA sequences (negative samples) were obtained from Zhang et al (2011). Because there are far less nonpiRNA small ncRNAs in one species than piRNAs, only 34,675 real non-piRNA sequences from 861 organisms are contained in NONCODE (Bu et al, 2011), which was collected by Zhang et al (2011). The remaining 158,646 sequences are well-designed and were generated by random processes according to real data, which was developed by Zhang et al (2011).…”
Section: Training Dataset and Length Filtermentioning
confidence: 99%
See 4 more Smart Citations
“…Non-piRNA sequences (negative samples) were obtained from Zhang et al (2011). Because there are far less nonpiRNA small ncRNAs in one species than piRNAs, only 34,675 real non-piRNA sequences from 861 organisms are contained in NONCODE (Bu et al, 2011), which was collected by Zhang et al (2011). The remaining 158,646 sequences are well-designed and were generated by random processes according to real data, which was developed by Zhang et al (2011).…”
Section: Training Dataset and Length Filtermentioning
confidence: 99%
“…Because there are far less nonpiRNA small ncRNAs in one species than piRNAs, only 34,675 real non-piRNA sequences from 861 organisms are contained in NONCODE (Bu et al, 2011), which was collected by Zhang et al (2011). The remaining 158,646 sequences are well-designed and were generated by random processes according to real data, which was developed by Zhang et al (2011). Furthermore, most non-piRNA sequences are considerably longer than piRNA sequences; therefore, if a sequence is too long, it is clearly not a piRNA and should be removed.…”
Section: Training Dataset and Length Filtermentioning
confidence: 99%
See 3 more Smart Citations