2019
DOI: 10.1186/s12859-019-3050-8
|View full text |Cite
|
Sign up to set email alerts
|

Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection

Abstract: Background MicroRNAs (miRNAs) are noncoding RNA molecules heavily involved in human tumors, in which few of them circulating the human body. Finding a tumor-associated signature of miRNA, that is, the minimum miRNA entities to be measured for discriminating both different types of cancer and normal tissues, is of utmost importance. Feature selection techniques applied in machine learning can help however they often provide naive or biased results. R… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
82
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3
2

Relationship

4
4

Authors

Journals

citations
Cited by 65 publications
(83 citation statements)
references
References 99 publications
1
82
0
Order By: Relevance
“…This leaves us with a frequency table of 3,827 features (21-bps sequences) with 583 samples (Table 3 (left)). Next, we ran a state-of-the-art feature selection algorithm 37,38 , to reduce the sequences needed to identify different virus strain to the bare minimum. Remarkably, we are then able to correctly differentiate all the coronavirus (MERS-CoV, SARS-CoV-2, SARS-CoV-1, etc) samples using only 53 of the original 3,827 sequences, obtaining a 100% accuracy in a 10-fold cross-validation with a simpler and more traditional classifier, such as Logistic Regression.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…This leaves us with a frequency table of 3,827 features (21-bps sequences) with 583 samples (Table 3 (left)). Next, we ran a state-of-the-art feature selection algorithm 37,38 , to reduce the sequences needed to identify different virus strain to the bare minimum. Remarkably, we are then able to correctly differentiate all the coronavirus (MERS-CoV, SARS-CoV-2, SARS-CoV-1, etc) samples using only 53 of the original 3,827 sequences, obtaining a 100% accuracy in a 10-fold cross-validation with a simpler and more traditional classifier, such as Logistic Regression.…”
Section: Methodsmentioning
confidence: 99%
“…We call this dataset NCBI-A, where 68 sequences belong to SARS-CoV-2. Then, we applied the procedure to translate the data into the set of sequence features, and we run the same state-of-the-art feature selection algorithm 37 . The result is a list of 10 different sequences (Table 4), for which just checking for their presence is enough to differentiate between SARS-CoV-2 and other viruses in the dataset, with a 100% accuracy.…”
Section: Experiments 2: Validation On the Ncbi Datasetmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, it is known that miRNAs are a key epigenetic mechanism for the control of gene transcription and may act in some cancer types as tumour suppressors and in others as oncogenes (136,137). In effect, miRNAs are emerging as promising therapeutic targets in various types of cancer and as ever more reliable prognostic factors in individualized precision medicine (138,139). The roles of miRNAs in uveal melanoma as important prognostic and diagnostic markers of tumour onset and progression have been confirmed (140).…”
Section: Role Of Epigenetics In the Development Of Uveal Melanomamentioning
confidence: 99%
“…With the CNN, we uncovered 18,258 features (21-bps length sequences) that allow the network to differentiate between symptomatic and asymptomatic patients. From these features, we ran a feature reduction algorithm [4] to find the 21-bps sequences that gave the maximum mean accuracy using 8 different classifiers (Gradient Boosting, Passive Aggressive, Logistic Regression, Support Vector, Random Forest, Stochastic Gradient Descent, Ridge and Bagging) from the scikit-learn toolbox [5], ultimately obtaining 53 meaningful 21-bps sequences ( Fig.1) with a global accuracy of 97.05%. Fig.…”
mentioning
confidence: 99%