Abstract:Noncoding RNA (ncRNA) is a kind of RNA that plays an important role in many biological processes, diseases, and cancers, while cannot translate into proteins. With the development of next-generation sequence technology, thousands of novel RNAs with long open reading frames (ORFs, longest ORF length > 303 nt) and short ORFs (longest ORF length ≤ 303 nt) have been discovered in a short time. How to identify ncRNAs more precisely from novel unannotated RNAs is an important step for RNA functional analysis, RNA re… Show more
“…Composition transition distribution (CTD) [ 1 ] is primarily proposed for predicting the protein folding class, which is a global protein sequence descriptor established by Dubchak’s work [ 24 ]. Lately, CTD features are found to relate to RNA structure and are seldom used to predict the interactions between lncRNAs and miRNAs.…”
Section: Methodsmentioning
confidence: 99%
“…Likewise, Ts, Gs, and Cs were 0.1, 0.3, 0.45, 0.6, 0.85, 0.25, 0.5, 0.65, 0.8, 0.95, 0.2, 0.4, 0.55, 0.75, and 1. We used A0, A1, A2, A3, A4, T0, T1, T2, T3, T4, G0, G1, G2, G3, G4, C0, C1, C2, C3, and C4 to represent the 20 features [ 1 ].…”
Section: Methodsmentioning
confidence: 99%
“…Although noncoding RNAs (ncRNAs) [ 1 ] cannot encode proteins, they play indispensable roles in numerous life processes [ 2 , 3 , 4 , 5 , 6 , 7 ]. Accumulated studies show that many ncRNAs are involved in various life regulation processes [ 8 , 9 ].…”
Long non-coding RNA (LncRNA) and microRNA (miRNA) are both non-coding RNAs that play significant regulatory roles in many life processes. There is cumulating evidence showing that the interaction patterns between lncRNAs and miRNAs are highly related to cancer development, gene regulation, cellular metabolic process, etc. Contemporaneously, with the rapid development of RNA sequence technology, numerous novel lncRNAs and miRNAs have been found, which might help to explore novel regulated patterns. However, the increasing unknown interactions between lncRNAs and miRNAs may hinder finding the novel regulated pattern, and wet experiments to identify the potential interaction are costly and time-consuming. Furthermore, few computational tools are available for predicting lncRNA–miRNA interaction based on a sequential level. In this paper, we propose a hybrid sequence feature-based model, LncMirNet (lncRNA–miRNA interactions network), to predict lncRNA–miRNA interactions via deep convolutional neural networks (CNN). First, four categories of sequence-based features are introduced to encode lncRNA/miRNA sequences including k-mer (k = 1, 2, 3, 4), composition transition distribution (CTD), doc2vec, and graph embedding features. Then, to fit the CNN learning pattern, a histogram-dd method is incorporated to fuse multiple types of features into a matrix. Finally, LncMirNet attained excellent performance in comparison with six other state-of-the-art methods on a real dataset collected from lncRNASNP2 via five-fold cross validation. LncMirNet increased accuracy and area under curve (AUC) by more than 3%, respectively, over that of the other tools, and improved the Matthews correlation coefficient (MCC) by more than 6%. These results show that LncMirNet can obtain high confidence in predicting potential interactions between lncRNAs and miRNAs.
“…Composition transition distribution (CTD) [ 1 ] is primarily proposed for predicting the protein folding class, which is a global protein sequence descriptor established by Dubchak’s work [ 24 ]. Lately, CTD features are found to relate to RNA structure and are seldom used to predict the interactions between lncRNAs and miRNAs.…”
Section: Methodsmentioning
confidence: 99%
“…Likewise, Ts, Gs, and Cs were 0.1, 0.3, 0.45, 0.6, 0.85, 0.25, 0.5, 0.65, 0.8, 0.95, 0.2, 0.4, 0.55, 0.75, and 1. We used A0, A1, A2, A3, A4, T0, T1, T2, T3, T4, G0, G1, G2, G3, G4, C0, C1, C2, C3, and C4 to represent the 20 features [ 1 ].…”
Section: Methodsmentioning
confidence: 99%
“…Although noncoding RNAs (ncRNAs) [ 1 ] cannot encode proteins, they play indispensable roles in numerous life processes [ 2 , 3 , 4 , 5 , 6 , 7 ]. Accumulated studies show that many ncRNAs are involved in various life regulation processes [ 8 , 9 ].…”
Long non-coding RNA (LncRNA) and microRNA (miRNA) are both non-coding RNAs that play significant regulatory roles in many life processes. There is cumulating evidence showing that the interaction patterns between lncRNAs and miRNAs are highly related to cancer development, gene regulation, cellular metabolic process, etc. Contemporaneously, with the rapid development of RNA sequence technology, numerous novel lncRNAs and miRNAs have been found, which might help to explore novel regulated patterns. However, the increasing unknown interactions between lncRNAs and miRNAs may hinder finding the novel regulated pattern, and wet experiments to identify the potential interaction are costly and time-consuming. Furthermore, few computational tools are available for predicting lncRNA–miRNA interaction based on a sequential level. In this paper, we propose a hybrid sequence feature-based model, LncMirNet (lncRNA–miRNA interactions network), to predict lncRNA–miRNA interactions via deep convolutional neural networks (CNN). First, four categories of sequence-based features are introduced to encode lncRNA/miRNA sequences including k-mer (k = 1, 2, 3, 4), composition transition distribution (CTD), doc2vec, and graph embedding features. Then, to fit the CNN learning pattern, a histogram-dd method is incorporated to fuse multiple types of features into a matrix. Finally, LncMirNet attained excellent performance in comparison with six other state-of-the-art methods on a real dataset collected from lncRNASNP2 via five-fold cross validation. LncMirNet increased accuracy and area under curve (AUC) by more than 3%, respectively, over that of the other tools, and improved the Matthews correlation coefficient (MCC) by more than 6%. These results show that LncMirNet can obtain high confidence in predicting potential interactions between lncRNAs and miRNAs.
“… [37] , [134] , [148] . In addition, the process of a deep neural network operation likes a black box, from which it is hard and difficult to interpret the performance and evaluate the importance of every input feature [149] . Such methods include LncRNA-MFDL, DeepLNC, LNCAdeep, NCResNet and so on [37] , [134] , [148] , [149] .…”
Section: General Profile For Lncrna Identification Toolsmentioning
confidence: 99%
“…During the process of developing NCResNet, Yang and his colleagues estimated the running time of six models and got similar results. All six tools, NCResNet, CPC2, CPAT, IRSOM, LncFinder, and CPPred, are capable of large-scale (thousands to tens of thousands of sequences) lncRNA identification tasks [149] .…”
Section: General Profile For Lncrna Identification Toolsmentioning
Long noncoding RNAs (lncRNAs) make up a large proportion of transcriptome in eukaryotes, and have been revealed with many regulatory functions in various biological processes. When studying lncRNAs, the first step is to accurately and specifically distinguish them from the colossal transcriptome data with complicated composition, which contains mRNAs, lncRNAs, small RNAs and their primary transcripts. In the face of such a huge and progressively expanding transcriptome data, the
in-silico
approaches provide a practicable scheme for effectively and rapidly filtering out lncRNA targets, using machine learning and probability statistics. In this review, we mainly discussed the characteristics of algorithms and features on currently developed approaches. We also outlined the traits of some state-of-the-art tools for ease of operation. Finally, we pointed out the underlying challenges in lncRNA identification with the advent of new experimental data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.