The relationship between genes in gene set analysis in microarray data is analyzed using Hotelling's T 2 but the test cannot be applied when the number of samples is larger than the number of variables which is uncommon in the microarray. Thus, in this study, we proposed shrinkage approaches to estimating the covariance matrix in Hotelling's T 2 particularly to cater high dimensionality problem in microarray data. Three shrinkage covariance methods were proposed in this study and are referred as Shrink A, Shrink B and Shrink C. The analysis of the three proposed shrinkage methods was compared with the Regularized Covariance Matrix Approach and Kong's Principal Component Analysis. The performances of the proposed methods were assessed using several cases of simulated data sets. In many cases, the Shrink A method performed the best, followed by the Shrink C and RCMAT methods. In contrast, both the Shrink B and KPCA methods showed relatively poor results. The study contributes to an establishment of modified multivariate approach to differential gene expression analysis and expected to be applied in other areas with similar data characteristics.
The DNA microarray technologies permit scientists to depict the expression of genes for related samples. This relationship between genes is analysed using Hotelling’s T2 as a multivariate test statistic but the disadvantage of this test, when used in microarray studies is the number of samples is larger than the number of variables. This study discovers the potential of the shrinkage approach to estimate the covariance matrix specifically when the high dimensionality problem happened. Consequently, the sample covariance matrix in Hotelling’s T2 statistic is not positive definite and become singular thus cannot be inverted. In this research, the Hotelling’s T2 statistic is combined with a shrinkage approach as an alternative estimation to estimate the covariance matrix to detect significant gene sets. The multivariate test statistic of classical Hotelling's T2 is used to integrate the correlation when assessing changes in activity level across biological conditions. The performances of the proposed methods were assessed using real data study. Shrinkage covariance matrix approach indicates a better result for detection of differentially expressed gene sets as compared to other methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.