Multiple testing corrections are a useful tool for restricting the FDR, but can be blunt in the context of low power, as we demonstrate by a series of simple simulations. Unfortunately, in proteomics experiments low power can be common, driven by proteomics-specific issues like small effects due to ratio compression, and few replicates due to reagent high cost, instrument time availability and other issues; in such situations, most multiple testing corrections methods, if used with conventional thresholds, will fail to detect any true positives even when many exist. In this low power, medium scale situation, other methods such as effect size considerations or peptide-level calculations may be a more effective option, even if they do not offer the same theoretical guarantee of a low FDR. Thus, we aim to highlight in this article that proteomics presents some specific challenges to the standard multiple testing corrections methods, which should be employed as a useful tool but not be regarded as a required rubber stamp. Keywords: FDR / Multiple testing corrections / Shot gun proteomics ViewpointMultiple testing corrections come in many "flavors." They are employed to limit the number of false positives occurring by chance when an analysis is repeated many times and thus to reduce the FDR at the analysis level. In proteomics they were initially borrowed from microarray research and other high throughput areas, where they quickly became the norm. The informative review by Diz [1] shows that multiple testing corrections were seldom used in quantitative proteomics until relatively recently, and recommends a sensible multimethod approach. Here, we suggest that one reason for the slower uptake of such methods in proteomics experiments, despite their ease of use and theoretical appeal, is that they remain a useful but blunt tool that is less effective in discovery proteomics than in, for example, the microarray environment. In this paper, we describe five key factors that combine to make multiple testing corrections less effective in proteomics: medium problem scale; lower effect size due to possible compression; lower analysis power due to high cost; percentage of data showing an effect; and data distribution quirks. We then discuss some simple alternatives that can help reduce, or at least understand the FDR in this medium scale, low power situation.
Western blotting as an orthogonal validation tool for quantitative proteomics data has rapidly become a de facto requirement for publication. In this viewpoint article, the pros and cons of western blotting as a validation approach are discussed, using examples from our own published work, and how to best apply it to improve the quality of data published is outlined. Further, suggestions and guidelines for some other experimental approaches are provided, which can be used for validation of quantitative proteomics data in addition to, or in place of, western blotting.
Proteomics, as a high-throughput technology, has been developed with the aim of investigating the maximum number of proteins in cells. However, protein discovery and data generation vary in depth and coverage when different technical strategies are used. In this study, four different sample preparation, and peptide or protein fractionation, methods were applied to identify and quantify proteins from log-phase yeast lysate: sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE); gas phase fractionation (GPF); filter-aided sample preparation (FASP)-GPF; and FASP-high pH reversed phase fractionation (HpH). Fractionated samples were initially analyzed and compared using nanoflow liquid chromatography-tandem mass spectrometry (LC-MS/MS) employing data dependent acquisition on a linear ion trap instrument.The number of fractions and replicates was adjusted so that each experiment used a similar amount of mass spectrometric instrument time, approximately 16 hours. A second set of experiments was performed using a Q Exactive Orbitrap instrument, comparing FASP-GPF, SDS-PAGE and FASP-HpH. Compared with results from the linear ion trap mass spectrometer, the use of a Q Exactive Orbitrap mass spectrometer enabled a small increase in protein identifications using SDS-PAGE and FASP-GPF methods, and a large increase using FASP-HpH. A big advantage of using the higher resolution instrument found in this study was the substantially increased peptide identifications which enhance the proteome coverage. A total of 1035, 1357 and 2134 proteins were separately identified by FASP-GPF, SDS-PAGE and FASP-HpH. Combining results from the Orbitrap experiments, there were a total of 2269 proteins found, with 94% of them identified using the FASP-HpH method. Therefore, the FASP-HpH method is the optimal choice among these approaches when using a high resolution spectrometer, when applied to this type of sample.
An experimentally-derived measure of inter-replicate variation in reference samples: 4 the same-same permutation methodology 5 6 Abstract 25The multiple testing problem is a well-known statistical stumbling block in high-26 throughput data analysis, where large scale repetition of statistical methods introduces 27 unwanted noise into the results. While approaches exist to overcome the multiple testing 28 problem, these methods focus on theoretical statistical clarification rather than incorporating 29 experimentally-derived measures to ensure appropriately tailored analysis parameters. Here, 30we introduce a method for estimating inter-replicate variability in reference samples for a 31 quantitative proteomics experiment using permutation analysis. This can function as a 32 modulator to multiple testing corrections such as the Benjamini-Hochberg ordered Q value 33 test. We refer to this as a 'same-same' analysis, since this method incorporates the use of six 34 biological replicates of the reference sample and determines, through non-redundant triplet 35 pairwise comparisons, the level of quantitative noise inherent within the system. The method 36 can be used to produce an experiment-specific Q value cut-off that achieves a specified false 37 discovery rate at the quantitation level, such as 1%. The same-same method is applicable to 38 any experimental set that incorporates six replicates of a reference sample. To facilitate 39 access to this approach, we have developed a same-same analysis R module that is freely 40 available and ready to use via the internet. 41 42Keywords: Label-free shotgun proteomics, false discovery rates, data quality, data 43 validation, statistics 44 45 46 47 48
We randomly selected 100 journal articles published in five proteomics journals in 2019 and manually examined each of them against a set of 13 criteria concerning the statistical analyses used, all of which were based on items mentioned in the journals' instructions to authors. This included questions such as whether a pilot study was conducted and whether false discovery rate calculation was employed at either the quantitation or identification stage. These data were then transformed to binary inputs, analyzed via machine learning algorithms, and classified accordingly, with the aim of determining if clusters of data existed for specific journals or if certain statistical measures correlated with each other. We applied a variety of classification methods including principal component analysis decomposition, agglomerative clustering, and multinomial and Bernoulli nai ̈ve Bayes classification and found that none of these could readily determine journal identity given extracted statistical features. Logistic regression was useful in determining high correlative potential between statistical features such as false discovery rate criteria and multiple testing corrections methods, but was similarly ineffective at determining correlations between statistical features and specific journals. This meta-analysis highlights that there is a very wide variety of approaches being used in statistical analysis of proteomics data, many of which do not conform to published journal guidelines, and that contrary to implicit assumptions in the field there are no clear correlations between statistical methods and specific journals.
PeptideWitch is a python-based web module that introduces several key graphical and technical improvements to the Scrappy software platform, which is designed for label-free quantitative shotgun proteomics analysis using normalised spectral abundance factors. The program inputs are low stringency protein identification lists output from peptide-to-spectrum matching search engines for ‘control’ and ‘treated’ samples. Through a combination of spectral count summation and inner joins, PeptideWitch processes low stringency data, and outputs high stringency data that are suitable for downstream quantitation. Data quality metrics are generated, and a series of statistical analyses and graphical representations are presented, aimed at defining and presenting the difference between the two sample proteomes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.