The discriminant power of RNA features for pre-miRNA recognition

Lopes, I. de O. N.; Schliep, Alexander; Carvalho, André C. P. L. F. de

doi:10.1186/1471-2105-15-124

Cited by 36 publications

(21 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The application of machine learning to biological data has become important and in pre-miRNA analysis it has become indispensable (Jiang et al, 2007; Lopes, Schliep & De Carvalho, 2014; Gudyś et al, 2013; Ding, Zhou & Guan, 2010; Bentwich, 2008; Batuwita & Palade, 2009; Van der Burgt et al, 2009; Gao et al, 2013). ML is a system which is influenced by different choices that can be made, for example, the selected training and testing datasets, feature selection, and the choice of classification algorithm.…”

Section: Resultsmentioning

confidence: 99%

“…The ‘pseudo’ dataset (8,492 hairpins) is a popular negative dataset used in various studies (Jiang et al, 2007; Chen, Wang & Liu, 2016; Lopes, Schliep & De Carvalho, 2014) on the detection of pre-miRNAs and it was downloaded from Ng & Mishra (2007). No other negative dataset has been used by more than one study on pre-miRNA detection.…”

Section: Methodsmentioning

confidence: 99%

“…Random forest (RF) is based on DT but many not fully induced decision trees are created and used as an ensemble for predictive analyses (Tin kam Ho, 0000). Random forest has been used in pre-miRNA detection (Lopes, Schliep & De Carvalho, 2014)). Multi-layer perceptron (MLP) was previously applied to pre-miRNA detection by Gudyś et al (2013).…”

Section: Methodsmentioning

confidence: 99%

See 2 more Smart Citations

Delineating the impact of machine learning elements in pre-microRNA detection

Demirci

Allmer

2017

PeerJ

View full text Add to dashboard Cite

Gene regulation modulates RNA expression via transcription factors. Post-transcriptional gene regulation in turn influences the amount of protein product through, for example, microRNAs (miRNAs). Experimental establishment of miRNAs and their effects is complicated and even futile when aiming to establish the entirety of miRNA target interactions. Therefore, computational approaches have been proposed. Many such tools rely on machine learning (ML) which involves example selection, feature extraction, model training, algorithm selection, and parameter optimization. Different ML algorithms have been used for model training on various example sets, more than 1,000 features describing pre-miRNAs have been proposed and different training and testing schemes have been used for model establishment. For pre-miRNA detection, negative examples cannot easily be established causing a problem for two class classification algorithms. There is also no consensus on what ML approach works best and, therefore, we set forth and established the impact of the different parts involved in ML on model performance. Furthermore, we established two new negative datasets and analyzed the impact of using them for training and testing. It was our aim to attach an order of importance to the parts involved in ML for pre-miRNA detection, but instead we found that all parts are intricately connected and their contributions cannot be easily untangled leading us to suggest that when attempting ML-based pre-miRNA detection many scenarios need to be explored.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Delineating the impact of machine learning elements in pre-microRNA detection

Demirci

Allmer

2017

PeerJ

View full text Add to dashboard Cite

show abstract

“…Thus, ML practitioners frequently perform feature selection procedures to select the most informative features to train useful models. The effects of feature selection in pre-miRNA prediction have been shown in the works of (13,14), where a large number of features are reduced to the most informative ones. Furthermore, the performance of ML models is also tied to the training and testing datasets.…”

Section: Introductionmentioning

confidence: 99%

Detection of pre-microRNA with Convolutional Neural Networks

Cruz

Menkovski

Allmer

2019

Preprint

View full text Add to dashboard Cite

MicroRNAs (miRNAs) are small non-coding RNA sequences that have been implicated in many physiological processes. Furthermore, miRNAs have been shown to be important biomarkers for diseases and their mimics are tested as drug candidates. The experimental discovery of miRNAs is complicated because both miRNAs and their targets need to be expressed for the confirmation of functional interaction. This is difficult since miRNA expression is under spatiotemporal control. This has motivated the development of computational methods for miRNA detection. Such computational methods typically involve the characterization of candidate sequences with features designed by domain experts and the application of statistical or machine learning algorithms. While such features can successfully encode domain knowledge, feature engineering is a difficult and timeconsuming task. Additionally, some engineered features pose excessive computational complexity that can hinder the large scale detection of miRNA. In contrast, advances of representation learning methods such as deep learning provide for automatic development of effective features directly from data. In this work, we propose a method that uses domain knowledge to create an efficient image representation of miRNA molecules encoding sequence, structure, and implicitly some thermodynamic information. We then use this low-level feature representation of the molecules to develop a hierarchical deep representation using a convolutional neural network model, which directly detects precursor miRNAs. With this method we achieve state-ofthe-art performance on all previously used datasets. Additionally, detection is achieved in real time thereby overcoming the high computational cost for current pre-miRNA feature calculations such as p-value based ones. Finally, the encoding and modeling process opens possibilities for interpretability of the models' behavior, which may lead to novel biological interpretations of miRNA genesis and targeting.

show abstract

“…The selection and validation of feature set and advances in machine learning techniques give way to the development algorithms with very high accuracy. Recently, studies are conducted exclusively to determine discriminant power of features selected in pre-microRNA identification [26]. A significant change arose due the advent of Next Generation Sequencing techniques, a quite a few algorithms based on enormous data output from sequencing techniques are also available.…”

Section: Introductionmentioning

confidence: 99%