2014
DOI: 10.1186/preaccept-1016030070136510
|View full text |Cite
|
Sign up to set email alerts
|

mirMark: a site-level and UTR-level classifier for miRNA target prediction

Abstract: MiRNAs play important roles in many diseases including cancers. However computational prediction of miRNA target genes is challenging and the accuracies of existing methods remain poor. We report mirMark, a new machine learning-based method of miRNA target prediction at the site and UTR levels. This method uses experimentally verified miRNA targets from miRecords and mirTarBase as training sets and considers over 700 features. By combining Correlation-based Feature Selection with a variety of statistical or ma… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
26
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
4
2
1

Relationship

3
4

Authors

Journals

citations
Cited by 11 publications
(26 citation statements)
references
References 22 publications
0
26
0
Order By: Relevance
“…The first dataset was downloaded from the supplementary tables of the DeepMirTar study, which contains 3,963 positive pairs of miRNA:target and 3,905 negative pairs of miRNA:target. The positive pairs in the DeepMirTar dataset were obtained from three resources: mirMark data (Menor, et al, 2014), CLASH data (Helwak, et al, 2013), and PAR-CLIP data (Hafner, et al, 2010). And only the target sites located in 3'UTRs, and the target sites with canonical seeds (exact W-C pairing of 2-7 or 3-8 nts of the miRNA) and non-canonical seeds (pairing at positions 2-7 or 3-8, allowing G-U pairs and up to one bulged or mis-matched nucleotide) were included.…”
Section: Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…The first dataset was downloaded from the supplementary tables of the DeepMirTar study, which contains 3,963 positive pairs of miRNA:target and 3,905 negative pairs of miRNA:target. The positive pairs in the DeepMirTar dataset were obtained from three resources: mirMark data (Menor, et al, 2014), CLASH data (Helwak, et al, 2013), and PAR-CLIP data (Hafner, et al, 2010). And only the target sites located in 3'UTRs, and the target sites with canonical seeds (exact W-C pairing of 2-7 or 3-8 nts of the miRNA) and non-canonical seeds (pairing at positions 2-7 or 3-8, allowing G-U pairs and up to one bulged or mis-matched nucleotide) were included.…”
Section: Datasetsmentioning
confidence: 99%
“…They were developed earlier, hence, the training datasets they used are much smaller. 507 target site-level and 2,891 gene-level miRNA:target pairs from mirMark repository (Menor, et al, 2014) were taken as the positive training dataset in the deepTarget study. mirMark repository is part of the DeepMirTar dataset, subsequently, we compared miTAR1 with the results reported from deepTarget study.…”
Section: Performance Comparison With Earlier Studies Using Test Datasetsmentioning
confidence: 99%
“…Menor et al [31] described 151 site-level features between miRNAtarget pairs for target prediction, categorizing them into seven groups: binding energy, the type of seed matching, miRNA pairing, target site accessibility, target site composition, target site conservation, and the location of target sites. To relieve these feature engineering required for target prediction in conventional approaches, deepTarget exploits the unsupervised feature learning using the RNN encoder-decoder model [44].…”
Section: Modeling Rnas Using Rnn Based Autoencodermentioning
confidence: 99%
“…To boost the sensitivity of miRNA target prediction, a variety of features have been proposed. According to Menor et al [31], as many as 151 kinds of features appear in the literature, which can be broadly grouped into four common types [37]: the degree of Watson-Crick matches of a seed sequence (see Figure 2); the degree of sequence conservation across species; Gibbs free energy, which measures the stability of the binding of a miRNA-mRNA pair, and the site accessibility, which measures the hybridization possibility of a pair from their secondary structures.…”
Section: Introductionmentioning
confidence: 99%
“…The trainees should be able to creatively transform data, by taking advantage of prior biological knowledge such as pathway or network information 4,5 . The trainees should have courses in statistics to thoroughly understand issues such as sample size, power, multiple hypothesis testing, classification (unsupervised learning), and generalized regression techniques (supervised learning) 6,7 . Training in multi-omics data integration (from the same population cohort) and meta-omics data integration (from heterogeneous populations) will be paramount to derive meaningful discoveries on molecular subtypes of diseases 8 .…”
Section: Areas Of Biomedical Data Science Demanding New Workforcementioning
confidence: 99%