SUMMARYThe theoretical principles and practical implementation of a new method for multivariate data analysis, maximum likelihood principal component analysis (MLPCA), are described. MLCPA is an analog to principal component analysis (PCA) that incorporates information about measurement errors to develop PCA models that are optimal in a maximum likelihood sense. The theoretical foundations of MLPCA are initially established using a regression model and extended to the framework of PCA and singular value decomposition (SVD). An efficient and reliable algorithm based on an alternating regression method is described. Generalization of the algorithm allows its adaptation to cases of correlated errors provided that the error covariance matrix is known. Models with intercept terms can also be accommodated. Simulated data and near-infrared spectra, with a variety of error structures, are used to evaluate the performance of the new algorithm. Convergence times depend on the error structure but are typically around a few minutes. In all cases, models determined by MLPCA are found to be superior to those obtained by PCA when non-uniform error distributions are present, although the level of improvement depends on the error structure of the particular data set.
Most cells on earth exist in a quiescent state. In yeast, quiescence is induced by carbon starvation, and exit occurs when a carbon source becomes available. To understand how cells survive in, and exit from this state, mRNA abundance was examined using oligonucleotide-based microarrays and quantitative reverse transcription-polymerase chain reaction. Cells in stationary-phase cultures exhibited a coordinated response within 5-10 min of refeeding. Levels of >1800 mRNAs increased dramatically (>64-fold), and a smaller group of stationary-phase mRNAs decreased in abundance. Motif analysis of sequences upstream of genes clustered by VxInsight identified an overrepresentation of Rap1p and BUF (RPA) binding sites in genes whose mRNA levels rapidly increased during exit. Examination of 95 strains carrying deletions in stationary-phase genes induced identified 32 genes essential for survival in stationary-phase at 37°C. Analysis of these genes suggests that mitochondrial function is critical for entry into stationary-phase and that posttranslational modifications and protection from oxidative stress become important later. The phylogenetic conservation of stationary-phase genes, and our findings that two-thirds of the essential stationary-phase genes have human homologues and of these, many have human homologues that are disease related, demonstrate that yeast is a bona fide model system for studying the quiescent state of eukaryotic cells.
Two new approaches to multivariate calibration are described that, for the first time, allow information on measurement uncertainties to be included in the calibration process in a statistically meaningful way. The new methods, referred to as maximum likelihood principal components regression (MLPCR) and maximum likelihood latent root regression (MLLRR), are based on principles of maximum likelihood parameter estimation. MLPCR and MLLRR are generalizations of principal components regression (PCR), which has been widely used in chemistry, and latent root regression (LRR), which has been virtually ignored in this field. Both of the new methods are based on decomposition of the calibration data matrix by maximum likelihood principal component analysis (MLPCA), which has been recently described (Wentzell, P. D.; et al. J. Chemom., in press). By using estimates of the measurement error variance, MLPCR and MLLRR are able to extract the optimum amount of information from each measurement and, thereby, exhibit superior performance over conventional multivariate calibration methods such as PCR and partial least-squares regression (PLS) when there is a nonuniform error structure. The new techniques reduce to PCR and LRR when assumptions of uniform noise are valid. Comparisons of MLPCR, MLLRR, PCR, and PLS are carried out using simulated and experimental data sets consisting of three-component mixtures. In all cases of nonuniform errors examined, the predictive ability of the maximum likelihood methods is superior to that of PCR and PLS, with PLS performing somewhat better than PCR. MLLRR generally performed better than MLPCR, but in most cases the improvement was marginal. The differences between PCR and MLPCR are elucidated by examining the multivariate sensitivity of the two methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.