Mass spectrometry is the method of choice for deep and reliable exploration of the (human) proteome. Targeted mass spectrometry reliably detects and quantifies pre-determined sets of proteins in a complex biological matrix and is used in studies that rely on the quantitatively accurate and reproducible measurement of proteins across multiple samples. It requires the one-time, a priori generation of a specific measurement assay for each targeted protein. SWATH-MS is a mass spectrometric method that combines data-independent acquisition (DIA) and targeted data analysis and vastly extends the throughput of proteins that can be targeted in a sample compared to selected reaction monitoring (SRM). Here we present a compendium of highly specific assays covering more than 10,000 human proteins and enabling their targeted analysis in SWATH-MS datasets acquired from research or clinical specimens. This resource supports the confident detection and quantification of 50.9% of all human proteins annotated by UniProtKB/Swiss-Prot and is therefore expected to find wide application in basic and clinical research. Data are available via ProteomeXchange (PXD000953-954) and SWATHAtlas (SAL00016-35).
SummaryDeciphering physiological changes that mediate transition of Mycobacterium tuberculosis between replicating and nonreplicating states is essential to understanding how the pathogen can persist in an individual host for decades. We have combined RNA sequencing (RNA-seq) of 5′ triphosphate-enriched libraries with regular RNA-seq to characterize the architecture and expression of M. tuberculosis promoters. We identified over 4,000 transcriptional start sites (TSSs). Strikingly, for 26% of the genes with a primary TSS, the site of transcriptional initiation overlapped with the annotated start codon, generating leaderless transcripts lacking a 5′ UTR and, hence, the Shine-Dalgarno sequence commonly used to initiate ribosomal engagement in eubacteria. Genes encoding proteins with active growth functions were markedly depleted from the leaderless transcriptome, and there was a significant increase in the overall representation of leaderless mRNAs in a starvation model of growth arrest. The high percentage of leaderless genes may have particular importance in the physiology of nonreplicating M. tuberculosis.
Targeted proteomics by selected/multiple reaction monitoring (S/MRM) or, on a larger scale, by SWATH (sequential window acquisition of all theoretical spectra) MS (mass spectrometry) typically relies on spectral reference libraries for peptide identification. Quality and coverage of these libraries are therefore of crucial importance for the performance of the methods. Here we present a detailed protocol that has been successfully used to build high-quality, extensive reference libraries supporting targeted proteomics by SWATH MS. We describe each step of the process, including data acquisition by discovery proteomics, assertion of peptide-spectrum matches (PSMs), generation of consensus spectra and compilation of MS coordinates that uniquely define each targeted peptide. Crucial steps such as false discovery rate (FDR) control, retention time normalization and handling of post-translationally modified peptides are detailed. Finally, we show how to use the library to extract SWATH data with the open-source software Skyline. The protocol takes 2-3 d to complete, depending on the extent of the library and the computational resources available.
SignificanceDespite the fundamental importance of the surfaceome as a signaling gateway to the cellular microenvironment, it remains difficult to determine which proteoforms reside in the plasma membrane and how they interact to enable context-dependent signaling functions. We applied a machine-learning approach utilizing domain-specific features to develop the accurate surfaceome predictor SURFY and used it to define the human in silico surfaceome of 2,886 proteins. The in silico surfaceome is a public resource which can be used to filter multiomics data to uncover cellular phenotypes and surfaceome markers. By our domain-specific feature machine-learning approach, we show indirectly that the environment (extracellular, cytoplasm, or vesicle) is reflected in the biochemical properties of protein domains reaching into that environment.
Mycobacterium tuberculosis remains a health concern due to its ability to enter a non-replicative dormant state linked to drug resistance. Understanding transitions into and out of dormancy will inform therapeutic strategies. We implemented a universally applicable, label-free approach to estimate absolute cellular protein concentrations on a proteome-wide scale based on SWATH mass spectrometry. We applied this approach to examine proteomic reorganization of M. tuberculosis during exponential growth, hypoxia-induced dormancy, and resuscitation. The resulting data set covering >2,000 proteins reveals how protein biomass is distributed among cellular functions during these states. The stress-induced DosR regulon contributes 20% to cellular protein content during dormancy, whereas ribosomal proteins remain largely unchanged at 5%-7%. Absolute protein concentrations furthermore allow protein alterations to be translated into changes in maximal enzymatic reaction velocities, enhancing understanding of metabolic adaptations. Thus, global absolute protein measurements provide a quantitative description of microbial states, which can support the development of therapeutic interventions.
SUMMARY
Research advancing our understanding of Mycobacterium tuberculosis (Mtb) biology and complex host-Mtb interactions requires consistent and precise quantitative measurements of Mtb proteins. We describe the generation and validation of a compendium of assays to quantify 97% of the 4,012 annotated Mtb proteins by the targeted mass spectrometric method selected reaction monitoring (SRM). Furthermore, we estimate the absolute abundance for 55% of all Mtb proteins, revealing a dynamic range within the Mtb proteome of over four orders of magnitude, and identify previously un-annotated proteins. As an example of the assay library utility, we monitored the entire Mtb dormancy survival regulon (DosR), which is linked to anaerobic survival and Mtb persistence, and show its dynamic protein-level regulation during hypoxia. In conclusion, we present a publicly available research resource that supports the sensitive, precise, and reproducible quantification of virtually any Mtb protein by a robust and widely accessible mass spectrometric method.
In this Perspective, we discuss developments in mass-spectrometry-based proteomic technology over the past decade from the viewpoint of our laboratory. We also reflect on existing challenges and limitations, and explore the current and future roles of quantitative proteomics in molecular systems biology, clinical research and personalized medicine.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.