2019
DOI: 10.3390/math7060493
|View full text |Cite
|
Sign up to set email alerts
|

Two-Stage Classification with SIS Using a New Filter Ranking Method in High Throughput Data

Abstract: Over the last decade, high dimensional data have been popularly paid attention to in bioinformatics. These data increase the likelihood of detecting the most promising novel information. However, there are limitations of high-performance computing and overfitting issues. To overcome the issues, alternative strategies need to be explored for the detection of true important features. A two-stage approach, filtering and variable selection steps, has been receiving attention. Filtering methods are divided into two… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
7
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 52 publications
(56 reference statements)
1
7
0
Order By: Relevance
“…With this setup of high-dimensional data, we simulated three different types of data, each with correlation structures ρ = 0.2, 0.5, and 0.8 respectively. These values show the low, intermediate, and high correlation structures in the datasets which are significantly similar to what we usually see in the gene expression or others among many types of data in the field of bioinformatics [13,52]. At first, the data were divided randomly into training and testing sets with 75% and 25% of samples respectively; 75% of the training data was given to the FS methods, which ranked the genes concerning their importance, and then the top-ranked genes were selected based on b-SIS condition.…”
Section: Simulation Data Setupsupporting
confidence: 76%
See 3 more Smart Citations
“…With this setup of high-dimensional data, we simulated three different types of data, each with correlation structures ρ = 0.2, 0.5, and 0.8 respectively. These values show the low, intermediate, and high correlation structures in the datasets which are significantly similar to what we usually see in the gene expression or others among many types of data in the field of bioinformatics [13,52]. At first, the data were divided randomly into training and testing sets with 75% and 25% of samples respectively; 75% of the training data was given to the FS methods, which ranked the genes concerning their importance, and then the top-ranked genes were selected based on b-SIS condition.…”
Section: Simulation Data Setupsupporting
confidence: 76%
“…From [13], we see that the resampling-based FS is relatively more efficient in comparison to the other existing FS methods in gene expression data. The RLFS method is based on the lasso penalized regression method and the resampling approach employed to obtain the ranked important features using the frequency.…”
Section: The Resampling-based Lasso Feature Selectionmentioning
confidence: 93%
See 2 more Smart Citations
“…Some of the existing studies for statistical comparison includes unsupervised clustering and normalization [20,21], supervised feature ranking and classi cation methods [22,23,19,24,25]. feature ranking and classi cation have been extensively utilized in microarray gene expression studies [26,27]. The key di erence of DNAm data from gene expression is that the DNAm has continuous variables ranging between 0 and 1.…”
Section: Introductionmentioning
confidence: 99%