In quantitative proteomics work, the differences in expression of many separate proteins are routinely examined to test for significant differences between treatments. This leads to the multiple hypothesis testing problem: when many separate tests are performed many will be significant by chance and be false positive results. Statistical methods such as the false discovery rate method that deal with this problem have been disseminated for more than one decade. However a survey of proteomics journals shows that such tests are not widely implemented in one commonly used technique, quantitative proteomics using two-dimensional electrophoresis. We outline a selection of multiple hypothesis testing methods, including some that are well known and some lesser known, and present a simple With the advent of high throughput genomics approaches, researchers need appropriate bioinformatic and statistical tools to deal with the large amounts of data generated. In quantitative proteomics work, differences in expression of many individual proteins between treatments or samples might need to be tested. Researchers must then address what has come to be known as the multiple hypothesis testing problem. Suppose 500 features such as protein spots in a two-dimensional electrophoresis (2-DE) 1 experiment, or mass spectrum features relating to protein or peptide abundance, are each compared between treatments using a t test. If the conventional a priori significance level of ␣ ϭ 0.05 is used, then 5% or about 25 significant features are expected to occur just by chance even if the null hypothesis of no treatment effect is true for all 500 features. Thus it is easier to make a false positive error when picking out significant results in an experiment with multiple features, than when considering one feature in isolation.A variety of statistical methods have been devised to deal with the multiple hypothesis testing problem. These are applicable in quantitative proteomics. In this paper we use examples from 2-DE proteomics to demonstrate these methods. In this technique, the intensity of signal from protein spots on 2-DE gels is measured and compared between gels. Use of the word "spot" is obviously not synonymous with use of the word "protein" in that it does not encompass all forms of a given protein such as alternatively spliced variants and posttranslational modification variants that might form spots in different positions on the gel. The multiple testing approach is introduced with the following example. Table I shows simulated data for a model of a 2-DE proteomics experiment in which 500 spots have been compared between two treatments using the t test. The third column gives p values significant at ␣ ϭ 0.05 sorted from low to high. A threshold line is shown drawn under spot 70. This has been selected arbitrarily for illustration of some properties of a threshold. The p values for the spots above the threshold are all less than ␣ ϭ 0.05 but we cannot declare them to be significant at the ␣ ϭ 0.05 level because of the multiple hypot...