Label-free quantification (LFQ) is one of the most efficient approaches for quantifying proteome differences between multiple states of a biological system. LFQ aims to reproducibly identify and quantify peptides through multiple liquid-chromatography-coupled tandem mass spectrometry (LC-MS/MS) experiments. In the popular data-dependent acquisition (DDA) approach named Top-N DDA, the appearance of a peptide-like signal in a "survey" mass spectrum triggers a tandem mass spectrometry (MS/MS) event, targeting the (N) most-abundant precursor ions. Previous studies have shown that, due to the limited speed of a mass spectrometer, the majority of peptide ions detected in MS 1 are not targeted in MS/MS, especially when a nonfractionated complex sample is analyzed (1, 2). This low sampling efficiency (Ͻ50%), combined with the stochastic nature of precursor selection and a limited efficiency of MS/MS identification (Ͻ70%) (3), frequently causes the absence of MS/MS identification for an individual peptide in some LC-MS/MS experiments ("runs") within a larger dataset, even when replicate measurements are made (4). This deficiency is known as the missing value problem in LFQ. The problem significantly limits the size of the DDA-acquired proteomics dataset across which reliable quantification can be made for each protein (5, 6).One of the causes of the missing value problem is the traditional focus on the process of identifying a peptide as opposed to its quantification. For historical reasons, peptide sequence identification has been considered the focal point and the most important step in the whole proteomics procedure, while quantification came as almost an afterthought (7,8). This dominant proteomics paradigm can be characterized as the identification-centered approach, also known as a spectrum-centric approach (9). Only gradually the missing value problem has been identified as one of the biggest drawbacks of the DDA approach (4, 5). To address the reproducibility issue in MS/MS identification, several alternative data acquisition strategies had been suggested, including targeted (10) and semi-targeted (11, 12) approaches. However, none of the improved DDA strategies has solved the missing value problem anywhere close to the data-independent acquisition (DIA) (13,14). The latter approach, however, typically provides somewhat lower depth and breadth of the proteome coverage than the DDA methods.In our opinion, the DDA-associated missing value problem is caused by the sequential execution of two independent processes: peptide identification by MS/MS and its quantification by MS 1 . At first glance, performing MS 1 -based quantification simultaneously with MS/MS identification should provide an obvious solution to the missing value problem. Since MS 1 spectra contain many more peptide ions than are selected for MS/MS in DDA (or identified in DIA), the peptide's mass information is practically always present when an iden-
Most implementations of mass spectrometry-based proteomics involve enzymatic digestion of proteins, expanding the analysis to multiple proteolytic peptides for each protein. Currently, there is no consensus of how to summarize peptides' abundances to protein concentrations, and such efforts are complicated by the fact that error control normally is applied to the identification process, and do not directly control errors linking peptide abundance measures to protein concentration. Peptides resulting from suboptimal digestion or being partially modified are not representative of the protein concentration. Without a mechanism to remove such unrepresentative peptides, their abundance adversely impacts the estimation of their protein's concentration. Here, we present a relative quantification approach, Diffacto, that applies factor analysis to extract the covariation of peptides' abundances. The method enables a weighted geometrical average summarization and automatic elimination of incoherent peptides. We demonstrate, based on a set of controlled label-free experiments using standard mixtures of proteins, that the covariation structure extracted by the factor analysis accurately reflects protein concentrations. In the 1% peptide-spectrum match-level FDR data set, as many as 11% of the peptides have abundance differences incoherent with the other peptides attributed to the same protein. If not controlled, such contradicting peptide abundance have a severe impact on protein quantifications. When adding the quantities of each protein's three most abundant peptides, we note as many as 14% of the proteins being estimated as having a negative correlation with their actual concentration differences between samples. Diffacto reduced the amount of such obviously incorrectly quantified proteins to 1.6%. Furthermore, by analyzing clinical data sets from two breast cancer studies, our method revealed the persistent proteomic signatures linked to three subtypes of breast cancer. We conclude that Diffacto can facilitate the interpretation and enhance the utility of most types of proteomics data.
Based on conventional data-dependent acquisition strategy of shotgun proteomics, we present a new workflow DeMix, which significantly increases the efficiency of peptide identification for in-depth shotgun analysis of complex proteomes. Capitalizing on the high resolution and mass accuracy of Orbitrap-based tandem mass spectrometry, we developed a simple deconvolution method of “cloning” chimeric tandem spectra for cofragmented peptides. Additional to a database search, a simple rescoring scheme utilizes mass accuracy and converts the unwanted cofragmenting events into a surprising advantage of multiplexing. With the combination of cloning and rescoring, we obtained on average nine peptide-spectrum matches per second on a Q-Exactive workbench, whereas the actual MS/MS acquisition rate was close to seven spectra per second. This efficiency boost to 1.24 identified peptides per MS/MS spectrum enabled analysis of over 5000 human proteins in single-dimensional LC-MS/MS shotgun experiments with an only two-hour gradient. These findings suggest a change in the dominant “one MS/MS spectrum - one peptide” paradigm for data acquisition and analysis in shotgun data-dependent proteomics. DeMix also demonstrated higher robustness than conventional approaches in terms of lower variation among the results of consecutive LC-MS/MS runs.
Deconvolution of targets and action mechanisms of anticancer compounds is fundamental in drug development. Here, we report on ProTargetMiner as a publicly available expandable proteome signature library of anticancer molecules in cancer cell lines. Based on 287 A549 adenocarcinoma proteomes affected by 56 compounds, the main dataset contains 7,328 proteins and 1,307,859 refined protein-drug pairs. These proteomic signatures cluster by compound targets and action mechanisms. The targets and mechanistic proteins are deconvoluted by partial least square modeling, provided through the website http://protargetminer.genexplain.com. For 9 molecules representing the most diverse mechanisms and the common cancer cell lines MCF-7, RKO and A549, deep proteome datasets are obtained. Combining data from the three cell lines highlights common drug targets and cell-specific differences. The database can be easily extended and merged with new compound signatures. ProTargetMiner serves as a chemical proteomics resource for the cancer research community, and can become a valuable tool in drug discovery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.