2020
DOI: 10.1007/s42979-020-00210-2
|View full text |Cite
|
Sign up to set email alerts
|

Improvements in the Large p, Small n Classification Issue

Abstract: Classifying gene expression data is known to contain keys for solving the fundamental problems in cancer studies. However, this issue is a complex task because of the large p, small n issue on gene expression data analysis. In this paper, we propose the improvements in the large p, small n classification issue for the study of human cancer. First, a new enhancing sample size method with generative adversarial network is proposed to improve classification algorithms. Second, we suggest a classification approach… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
12
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(12 citation statements)
references
References 56 publications
(60 reference statements)
0
12
0
Order By: Relevance
“…Among the numerous proposed methods adapted to multi-omics data analysis [ 1 , 2 ], only few are capable of performing feature selection. Most of these methods fail to implement feature selection because multi-omics data analysis has a strong tendency to involve a small number ( ) of samples with a large number ( ) of features, commonly referred to as the large p small n problem [ 3 ], posing difficulty for accurate feature selection. Features should have a sufficiently small P value to be selected under the null hypothesis.…”
Section: Introductionmentioning
confidence: 99%
“…Among the numerous proposed methods adapted to multi-omics data analysis [ 1 , 2 ], only few are capable of performing feature selection. Most of these methods fail to implement feature selection because multi-omics data analysis has a strong tendency to involve a small number ( ) of samples with a large number ( ) of features, commonly referred to as the large p small n problem [ 3 ], posing difficulty for accurate feature selection. Features should have a sufficiently small P value to be selected under the null hypothesis.…”
Section: Introductionmentioning
confidence: 99%
“…When , we call the data having the “large-p-small-n” problem. Statistical models could result in poor prediction performance due to over-fitting when training data contain fewer samples compared to the number of features [ 6 ]. There are several methods to deal with the “large-p-small-n” problem in Machine Learning (ML), of which feature selection is the most useful.…”
Section: Introductionmentioning
confidence: 99%
“…Feature selection with multi-omics datasets has always been difficult; numerous proposed methods adapted to multi-omics data analysis (Reel et al ., 2021; Subramanian et al ., 2020) include a limited number of methods that perform feature selection. Most methods fail to implement feature selection because multi-omics data analysis has a strong tendency to have a large p small n problem (Huynh et al ., 2020) to which feature selection is difficult. Large p small n indicates the situation where there is only a small number (= n ) of samples with a large number (= p ) of features.…”
Section: Introductionmentioning
confidence: 99%
“…Among the numerous proposed methods adapted to multi-omics data analysis [1,2], only few are capable of performing feature selection. Most of these methods fail to implement feature selection because multi-omics data analysis has a strong tendency to involve a small number (= n) of samples with a large number (= p) of features, commonly referred to as the large p small n problem [3], posing difficulty for accurate feature selection. Features should have a sufficiently small P -value to be selected under the null hypothesis.…”
Section: Introductionmentioning
confidence: 99%