2020
DOI: 10.1038/s41598-020-75005-9
|View full text |Cite
|
Sign up to set email alerts
|

Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms

Abstract: Increasing evidence indicates that miRNAs play a vital role in biological processes and are closely related to various human diseases. Research on miRNA-disease associations is helpful not only for disease prevention, diagnosis and treatment, but also for new drug identification and lead compound discovery. A novel sequence- and symptom-based random forest algorithm model (Seq-SymRF) was developed to identify potential associations between miRNA and disease. Features derived from sequence information and clini… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(2 citation statements)
references
References 33 publications
0
2
0
Order By: Relevance
“…Considering that unlabeled samples may contain potential positive samples (i.e., DDIs that have not been verified by experiments), randomly selecting negative samples from unlabeled ones may affect the prediction performance of the model. Therefore, the positive unlabeled learning method proposed by Li et al [ 50 ] was adopted to select reliable negative samples. Steps are as follows: (1) use the fingerprint feature of the drug to calculate the mean of each dimension in the positive samples and form a 2048-dimensional feature vector (i.e., cluster center); (2) calculate the Euclidean distance between all unlabeled samples and the cluster center, and the average Euclidean distance (AED); (3) set a threshold D (D = n × AED, n ∈ ℝ + ), and an unlabeled sample can be regarded as a reliable negative sample if the distance from the unlabeled one to the cluster center is higher than the threshold.…”
Section: Methodsmentioning
confidence: 99%
“…Considering that unlabeled samples may contain potential positive samples (i.e., DDIs that have not been verified by experiments), randomly selecting negative samples from unlabeled ones may affect the prediction performance of the model. Therefore, the positive unlabeled learning method proposed by Li et al [ 50 ] was adopted to select reliable negative samples. Steps are as follows: (1) use the fingerprint feature of the drug to calculate the mean of each dimension in the positive samples and form a 2048-dimensional feature vector (i.e., cluster center); (2) calculate the Euclidean distance between all unlabeled samples and the cluster center, and the average Euclidean distance (AED); (3) set a threshold D (D = n × AED, n ∈ ℝ + ), and an unlabeled sample can be regarded as a reliable negative sample if the distance from the unlabeled one to the cluster center is higher than the threshold.…”
Section: Methodsmentioning
confidence: 99%
“…They constructed the multi-network of miRNA, long non-coding RNA (lncRNA) and disease to propagate label that could be utilized to infer unknown interaction information of miRNA-disease. Recently, Li et al [ 24 ] predicted miRNA-disease connections by utilizing the modified random forest algorithm that relied on sequence information and symptom information in Seq-SymRF model. Moreover, Euclidean distance-based clustering method was applied to choose reliable negative samples in this model.…”
Section: Introductionmentioning
confidence: 99%