We describe an integrative data-driven mathematical framework that formulates any number of genome-scale molecular biological data sets in terms of one chosen set of data samples, or of profiles extracted mathematically from data samples, designated the ''basis'' set. By using pseudoinverse projection, the molecular biological profiles of the data samples are least-squares-approximated as superpositions of the basis profiles. Reconstruction of the data in the basis simulates experimental observation of only the cellular states manifest in the data that correspond to those of the basis. Classification of the data samples according to their reconstruction in the basis, rather than their overall measured profiles, maps the cellular states of the data onto those of the basis and gives a global picture of the correlations and possibly also causal coordination of these two sets of states. We illustrate this framework with an integration of yeast genome-scale proteins' DNA-binding data with cell cycle mRNA expression time course data. Novel correlation between DNA replication initiation and RNA transcription during the yeast cell cycle, which might be due to a previously unknown mechanism of regulation, is predicted. singular value decomposition ͉ generalized singular value decomposition ͉ DNA microarrays ͉ yeast Saccharomyces cerevisiae cell cycle R ecent advances in high-throughput technologies enable monitoring molecular biological signals, e.g., mRNA expression levels and proteins' DNA-binding occupancy levels, that correspond to activities of cellular systems, e.g., DNA replication, RNA transcription, and proteins' DNA-binding on a genomic scale. Integrative analysis of these global signals promises to give new insights into cellular mechanisms of regulation, i.e., global causal coordination of cellular activities. Integrative analysis of different types of large-scale molecular biological data requires mathematical tools that are able to formulate any number of large-scale data sets in terms of a common frame of reference, while reducing the complexity of the data to make them comprehensible (1, 2). These tools should provide data-driven models or mathematical frameworks for the description of the data, where the variables, i.e., the patterns that they uncover in the data, and operations, i.e., data reconstruction and classification in subspaces spanned by these patterns, may represent some biological reality.Recently we showed that singular value decomposition (SVD) (3, 4) and generalized SVD (GSVD) (5) provide such data-driven frameworks for genome-scale molecular biological data. For example, the variables of SVD, ''eigengenes'' and corresponding ''eigenarrays,'' in the analyses of yeast Saccharomyces cerevisiae cell cycle time course mRNA expression data (6), and those of GSVD, ''genelets'' and corresponding ''arraylets,'' in the comparative analysis of yeast and human (7) cell cycle time course mRNA expression data, were shown to correlate with observed genome-scale effects of known cell cycle regulators and...