How to talk about protein‐level false discovery rates in shotgun proteomics

The, Matthew; Tasnim, Ayesha; Käll, Lukas

doi:10.1002/pmic.201500431

Cited by 50 publications

(62 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…stemming from the set of 1000 non-present PrESTs or from the set of the mixtures not used in the sample) as being incorrectly matched. As the correct matches only map to present proteins, whereas incorrect matches distribute over both present and absent proteins, we also normalized the Entrapment FDR by the prior probability of the PrEST to be absent, the so-called π A [4].…”

Section: Data Processingmentioning

confidence: 99%

“…Currently, there are two available methods to determine the accuracy of inference procedures and their error estimates: (i) simulations of proteomics experiments and (ii) analysis of experiments on samples with known protein content. By simulating the proteolytic digestion and the subsequent matching of mass spectra to peptides [2,3,4] one can obtain direct insights into how well the simulated absence or presence of a protein is reflected by a protein inference procedure. However, there is always the risk that the assumptions of the simulations are diverging from the complex nature of a mass spectrometry experiment.…”

Section: Introductionmentioning

confidence: 99%

“…However, there is always the risk that the assumptions of the simulations are diverging from the complex nature of a mass spectrometry experiment. Hence, accurate predictions on simulated data can only be viewed as a minimum requirement for a method to be considered accurate [4].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A protein standard that emulates homology for the characterization of protein inference algorithms

Edfors

Perez‐Riverol

Payne

et al. 2017

Preprint

Self Cite

View full text Add to dashboard Cite

A natural way to benchmark the performance of an analytical experimental setup is to use samples of known content, and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, one of the inherent problems of interpreting data is that the measured analytes are peptides and not the actual proteins themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides and there is a need for mechanisms that infer proteins from lists of detected peptides. A weakness of commercially available samples of known content is that they consist of proteins that are deliberately selected for producing tryptic peptides that are unique to a single protein. Unfortunately, such samples do not expose any complications in protein inference. For a realistic benchmark of protein inference procedures, there is, therefore, a need for samples of known content where the present proteins share peptides with known absent proteins. Here, we present such a standard, that is based on E. coli expressed human protein fragments. To illustrate the usage of this standard, we benchmark a set of different protein inference procedures on the data. We observe that inference procedures excluding shared peptides provide more accurate estimates of errors compared to methods that include information from shared peptides, while still giving a reasonable performance in terms of the number of identified proteins. We also demonstrate that using a sample of known protein content without proteins with shared tryptic peptides can give a false sense of accuracy for many protein inference methods.

show abstract

Section: Data Processingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A protein standard that emulates homology for the characterization of protein inference algorithms

Edfors

Perez‐Riverol

Payne

et al. 2017

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The need for an integrated model becomes clear once one considers the most natural hypothesis [33] for protein quantification: one strives to estimate the combined probability that a particular protein is (i) correctly identified, (ii) correctly quantified and (iii) present in a different quantity between treatment groups. The separate probabilities of (i), (ii) and (iii) are less interesting individually and worse, one is easily lulled into a false sense of reliability by claims of control of the FDR in individual steps.…”

Section: Introductionmentioning

confidence: 99%

“…Firstly, proteins are selected for further analysis based on identification FDR. However, the identification FDR is an estimate of the evidence for the presence of proteins [33], and not a measure of how quantifiable they are i.e. their peptides being detected across conditions and being in the quantifiable range.…”

Section: Introductionmentioning

confidence: 99%

Integrated identification and quantification error probabilities for shotgun proteomics

Käll

2018

Preprint

Self Cite

View full text Add to dashboard Cite

Protein quantification by label-free shotgun proteomics experiments is plagued by a multitude of error sources. Typical pipelines for identifying differentially expressed proteins use intermediate filters in an attempt to control the error rate. However, they often ignore certain error sources and, moreover, regard filtered lists as completely correct in subsequent steps. These two indiscretions can easily lead to a loss of control of the false discovery rate (FDR). We propose a probabilistic graphical model, Triqler, that propagates error information through all steps, employing distributions in favor of point estimates, most notably for missing value imputation. The model outputs posterior probabilities for fold changes between treatment groups, highlighting uncertainty rather than hiding it. We analyzed 3 engineered datasets and achieved FDR control and high sensitivity, even for truly absent proteins. In a bladder cancer clinical dataset we discovered 35 proteins at 5% FDR, with the original study discovering none at this threshold. Compellingly, these proteins showed enrichment for functional annotation terms. The model executes in minutes and is freely available at https://pypi.org/project/triqler/.

show abstract

Profiling the Protein Targets of Unmodified Bio‐Active Molecules with Drug Affinity Responsive Target Stability and Liquid Chromatography/Tandem Mass Spectrometry

et al. 2020

View full text Add to dashboard Cite

Identifying the target proteins of bioactive small molecules is a key step in understanding mode-of-action of the drug and addressing the underlying mechanisms responsible for a particular phenotype. Proteomics has been successfully used to elucidate the target protein profiles of unmodified and ligand-modified bioactive small molecules. In the latter approach, compounds can be modified via click chemistry and combined with activity-based protein profiling. Target proteins are then enriched by performing a pull-down with the modified ligand. Methods that utilize unmodified bioactive small molecules include the cellular thermal shift assay, thermal proteome profiling, stability of proteins from rates of oxidation, and the drug affinity responsive target stability (DARTS) determination (or read-out). This review highlights recent proteomic approaches utilizing data-dependent analysis and data-independent analysis to identify target proteins by DARTS. When combined with liquid chromatography/tandem mass spectrometry, DARTS enables the identification of proteins that bind to drug molecules that leads to a conformational change in the target protein(s). In addition, an effective strategy is proposed for selecting the target protein(s) from within the pool of analyzed candidates. With additional complementary methods, the biologically relevant target proteins that bind to the small bio-active molecules can be further validated.

show abstract

How to talk about protein‐level false discovery rates in shotgun proteomics

Cited by 50 publications

References 33 publications

A protein standard that emulates homology for the characterization of protein inference algorithms

A protein standard that emulates homology for the characterization of protein inference algorithms

Integrated identification and quantification error probabilities for shotgun proteomics

Profiling the Protein Targets of Unmodified Bio‐Active Molecules with Drug Affinity Responsive Target Stability and Liquid Chromatography/Tandem Mass Spectrometry

Contact Info

Product

Resources

About