High-throughput technologies, including gene-expression microarrays, hold great promise for the systems-level study of biological processes. Yet, challenges remain in comparing microarray data from different sources and extracting information about low-abundance transcripts. We demonstrate that these difficulties arise from limitations in the modeling of the data. We propose a physically motivated approach for estimating gene-expression levels from microarray data, an approach neglected in the microarray literature. We separately model the noises specific to sample amplification, hybridization, and fluorescence detection, combining these into a parsimonious description of the variability sources in a microarray experiment. We find that our model produces estimates of gene expression that are reproducible and unbiased. While the details of our model are specific to gene-expression microarrays, we argue that the physically grounded modeling approach we pursue is broadly applicable to other molecular biology technologies.process modeling | statistical power O ne thousand manuscripts are published each year involving microarray technology. † In spite of the 15-year history of the field, those manuscripts still describe a wide variety of data analysis methods, many of them poorly specified. Indeed, criticisms of the validity and reproducibility of microarray experiments have dogged the technology since its inception. There are two possible explanations for these shortcomings: (i) inherent limitations of the microarray technology that constrain its utility or (ii) modeling strategies that are not appropriate. The former is potentially a fundamental problem that can be overcome only with technological advances. This hypothesis has led to candid speculation that emerging sequencing technologies will quickly replace microarrays as the de facto genome-wide expression analysis technique (1, 2).An alternative view is that current shortcomings result from gaps in our understanding of how to model the data generated in microarray experiments. In order to pursue this point, let us consider the motivation for the "standard" model (3). The fluorescence intensity F i (Fig. 1A) detected at a spot i is surmised to be the sum of a background term and a term related to the expression level E i we want to estimate,Oddly, the standard model assumes that B i can be directly determined from the fluorescence intensity measured in the nonfeature region surrounding the spot. ‡ The dependence on E i is assumed to be distorted by multiplicative noise (3). These assumptions yieldwhere ν sp is normally distributed with zero mean, and A i is a parameter capturing the effects of hybridization efficiency and dye-specific and experiment-specific factors. Because of the difficulty in estimating systematic effects affecting the value of A, microarray experiments are frequently performed with an internal control, the goal being to determine change of expression R i between two conditions, 1 and 2, instead of the expression level for each condition:whe...