Kun Yang scite author profile

BackgroundMicroarray data analysis is notorious for involving a huge number of genes compared to a relatively small number of samples. Gene selection is to detect the most significantly differentially expressed genes under different conditions, and it has been a central research focus. In general, a better gene selection method can improve the performance of classification significantly. One of the difficulties in gene selection is that the numbers of samples under different conditions vary a lot.ResultsTwo novel gene selection methods are proposed in this paper, which are not affected by the unbalanced sample class sizes and do not assume any explicit statistical model on the gene expression values. They were evaluated on eight publicly available microarray datasets, using leave-one-out cross-validation and 5-fold cross-validation. The performance is measured by the classification accuracies using the top ranked genes based on the training datasets.ConclusionThe experimental results showed that the proposed gene selection methods are efficient, effective, and robust in identifying differentially expressed genes. Adopting the existing SVM-based and KNN-based classifiers, the selected genes by our proposed methods in general give more accurate classification results, typically when the sample class sizes in the training dataset are unbalanced.

show abstract

The impact of sample imbalance on identifying differentially expressed genes

Yang

Gao

2006

BMC Bioinformatics

View full text Add to dashboard Cite

Background: Recently several statistical methods have been proposed to identify genes with differential expression between two conditions. However, very few studies consider the problem of sample imbalance and there is no study to investigate the impact of sample imbalance on identifying differential expression genes. In addition, it is not clear which method is more suitable for the unbalanced data.

show abstract

More discussions for granger causality and new causality measures

Cao

Zhang

et al. 2011

Cogn Neurodyn

View full text Add to dashboard Cite

Granger causality (GC) has been widely applied in economics and neuroscience to reveal causality influence of time series. In our previous paper (Hu et al., in IEEE Trans on Neural Netw, 22(6), pp. 829-844, 2011), we proposed new causalities in time and frequency domains and particularly focused on new causality in frequency domain by pointing out the shortcomings/limitations of GC or Granger-alike causality metrics and the advantages of new causality. In this paper we continue our previous discussions and focus on new causality and GC or Granger-alike causality metrics in time domain. Although one strong motivation was introduced in our previous paper (Hu et al., in IEEE Trans on Neural Netw, 22(6), pp. 829-844, 2011) we here present additional motivation for the proposed new causality metric and restate the previous motivation for completeness. We point out one property of conditional GC in time domain and the shortcomings/limitations of conditional GC which cannot reveal the real strength of the directional causality among three time series. We also show the shortcomings/limitations of directed causality (DC) or normalize DC for multivariate time series and demonstrate it cannot reveal real causality at all. By calculating GC and new causality values for an example we demonstrate the influence of one of the time series on the other is linearly increased as the coupling strength is linearly increased. This fact further supports reasonability of new causality metric. We point out that larger instantaneous correlation does not necessarily mean larger true causality (e.g., GC and new causality), or vice versa. Finally we conduct analysis of statistical test for significance and asymptotic distribution property of new causality metric by illustrative examples.

show abstract

A Comparative Study of Two Reference Estimation Methods in EEG Recording

Cao

Chen

et al. 2012

View full text Add to dashboard Cite

In [1] we proposed two methods to identify the reference electrode signal under the key assumption that the reference signal is independent from EEG sources. This assumption is shown to be possibly true for intracranial EEG with a scalp reference. In this paper, we theoretically prove that the obtained reference signal by using the second method in [1] or the equivalent MPDR approach [2] outperforms the widely used average reference (AR) if the real reference is independent from EEG sources. The simulation results confirm the advantages over AR.

show abstract

Determining the repeat number of cross-validation

Yang

Wang

Dai

et al. 2011

View full text Add to dashboard Cite

The cross-validation is probably the most popular approach for estimating the classification error rate in classifying gene expression data. In order to reduce the variance of estimation, the procedure of cross-validation will be repeated to get the average result. However, the repetition number of cross-validation is generally set by an empirical value. This paper proposed two methods (FCI and TSE) for determining the repeat number of cross-validation based on the approximate confidence interval. The experimental results on real data show the empirical method of giving repeat number of cross-validation is usually unreliable and the proposed methods can determine cross-validation repeat number to achieve a pre-specified precision of the error rate. Furthermore, both methods can automatically adjust to meet the change of data, the value of k-fold and classification model.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kun Yang

A stable gene selection in microarray data analysis

The impact of sample imbalance on identifying differentially expressed genes

More discussions for granger causality and new causality measures

A Comparative Study of Two Reference Estimation Methods in EEG Recording

Determining the repeat number of cross-validation

Contact Info

Product

Resources

About