A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate (FDR). Many consider protein‐level FDRs a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein‐level FDRs, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the FDR. Furthermore, we demonstrate how the same simulations can be used to verify FDR estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein‐level FDRs for both competing null hypotheses.
Highlights• Unified identification and quantification error rates for protein quantification.• Error propagation using graphical models and Bayesian statistics.• Account for uncertainty of missing values instead of overconfident point estimates.• Control of differential expression false discovery rate at increased sensitivity.
A current trend in proteomics is
to acquire data in a “single-shot”
by LC–MS/MS because it simplifies workflows and promises better
throughput and quantitative accuracy than schemes that involve extensive
sample fractionation. However, single-shot approaches can suffer from
limited proteome coverage when performed by data dependent acquisition
(ssDDA) on nanoflow LC systems. For applications where sample quantities
are not scarce, this study shows that high proteome coverage can be
obtained using a microflow LC–MS/MS system operating a 1 mm
i.d. × 150 mm column, at a flow-rate of 50 μL/min
and coupled to an Orbitrap HF-X mass spectrometer. The results demonstrate
the identification of ∼9 000 proteins from 50 μg
of protein digest from Arabidopsis roots, 7 500
from mouse thymus, and 7 300 from human breast cancer cells
in 3 h of analysis time in a single run. The dynamic range of protein
quantification measured by the iBAQ approach spanned 5 orders of magnitude
and replicate analysis showed that the median coefficient of variation
was below 20%. Together, this study shows that ssDDA by μLC–MS/MS
is a robust method for comprehensive and large-scale proteome analysis
and which may be further extended to more rapid chromatography and
data independent acquisition approaches in the future.̀
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.