Cancer is a multistep process characterized by altered signal transduction, cell growth, and metabolism. To identify such processes in early carcinogenesis we use an information theoretic approach to characterize gene expression quantified as mRNA levels in primary keratinocytes (K) and human papillomavirus 16 (HPV16)-transformed keratinocytes (HF1 cells) from early (E) and late (L) passages and from benzo(a)pyrene-treated (BP) L cells. Our starting point is that biological signaling processes are subjected to the same quantitative laws as inanimate, nonequilibrium chemical systems. Environmental and genomic constraints thereby limit the maximal thermodynamic entropy that the biological system can reach. The procedure uncovers the changes in gene expression patterns in different networks and defines the significance of each altered network in the establishment of a particular phenotype. The development of transformed HF1 cells is shown to be represented by one major transcription pattern that is important at all times. Two minor transcription patterns are also identified, one that contributes at early times and a distinguishably different pattern that contributes at later times. All three transcription patterns defined by our analysis were validated by gene expression values and biochemical means. The major transcription pattern includes reduced transcripts participating in the apoptotic network and enhanced transcripts participating in cell cycle, glycolysis, and oxidative phosphorylation. The two minor patterns identify genes that are mainly involved in lipid or carbohydrate metabolism. microarray analysis | oncogenic transformation | surprisal analysis | maximal entropy | gene transcription patterns G ene expression profiling describes the transcription patterns of thousands of mRNAs at the same time point, allowing insight into or comparison of different cellular conditions. Regulation of gene expression is relevant to many areas of biology and medicine, including the study of different diseases and specifically cancer. To cope with the massive amount of available microarray data [see, for example, the Gene Expression Omnibus (GEO) database], many software packages have been developed (1). These techniques identify a list of "interesting" genes and search for their biological relevance. The techniques used for analysis of microarray data can identify networks that have been changed at each condition. However, it is not possible to delineate the significance of such overall changes to the different transcription patterns that are associated with the different phenotypes. We here propose and apply a physically motivated global method of gene expression analysis that seeks to uncover both the changes in expression patterns of different networks and the significance of each altered network in the establishment of each particular phenotype.Cancer is an evolving, complex system, which goes through several stages before full malignancy. To demonstrate the application of our method we compare gene expression between...