http://proteomics.informatics.iupui.edu/software/toppic/ CONTACT: xwliu@iupui.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Capillary zone electrophoresis (CZE)-tandem mass spectrometry (MS/MS) has been recognized as a useful tool for top-down proteomics. However, its performance for deep top-down proteomics is still dramatically lower than widely used reversed-phase liquid chromatography (RPLC)-MS/MS. We present an orthogonal multidimensional separation platform that couples size exclusion chromatography (SEC) and RPLC based protein prefractionation to CZE-MS/MS for deep top-down proteomics of Escherichia coli. The platform generated high peak capacity (∼4000) for separation of intact proteins, leading to the identification of 5700 proteoforms from the Escherichia coli proteome. The data represents a 10-fold improvement in the number of proteoform identifications compared with previous CZE-MS/MS studies and represents the largest bacterial top-down proteomics data set reported to date. The performance of the CZE-MS/MS based platform is comparable to the state-of-the-art RPLC-MS/MS based systems in terms of the number of proteoform identifications and the instrument time.
Capillary zone electrophoresis-electrospray ionization-tandem mass spectrometry (CZE-ESI-MS/MS) has been recognized as an invaluable platform for top-down proteomics. However, the scale of top-down proteomics using CZE-MS/MS is still limited due to the low loading capacity and narrow separation window of CZE. In this work, for the first time we systematically evaluated the dynamic pH junction method for focusing of intact proteins during CZE-MS. The optimized dynamic pH junction based CZE-MS/MS approached 1-μL loading capacity, 90-min separation window and high peak capacity (~280) for characterization of an Escherichia coli proteome. The results represent the largest loading capacity and the highest peak capacity of CZE for top-down characterization of complex proteomes. Single-shot CZE-MS/MS identified about 2,800 proteoform-spectrum matches, nearly 600 proteoforms, and 200 proteins from the Escherichia coli proteome with spectrum-level false discovery rate (FDR) less than 1%. The number of identified proteoforms in this work is over three times higher than that in previous single-shot CZE-MS/MS studies. Truncations, N-terminal methionine excision, signal peptide removal and some post-translational modifications including oxidation and acetylation were detected.
Native proteomics aims to characterize complex proteomes under native conditions and ultimately produces a full picture of endogenous protein complexes in cells. It requires novel analytical platforms for high-resolution and liquid-phase separation of protein complexes prior to native mass spectrometry (MS) and MS/MS. In this work, size-exclusion chromatography (SEC)-capillary zone electrophoresis (CZE)-MS/MS was developed for native proteomics in discovery mode, resulting in the identification of 144 proteins, 672 proteoforms, and 23 protein complexes from the Escherichia coli proteome. The protein complexes include four protein homodimers, 16 protein-metal complexes, two protein-[2Fe-2S] complexes, and one protein-glutamine complex. Half of them have not been reported in the literature. This work represents the first example of online liquid-phase separation-MS/MS for the characterization of a complex proteome under the native condition, offering the proteomics community an efficient and simple platform for native proteomics.
Labeling approaches
using isobaric chemical tags (e.g., isobaric
tagging for relative and absolute quantification, iTRAQ and tandem
mass tag, TMT) have been widely applied for the quantification of
peptides and proteins in bottom-up MS. However, until recently, successful
applications of these approaches to top-down proteomics have been
limited because proteins tend to precipitate and “crash”
out of solution during TMT labeling of complex samples making the
quantification of such samples difficult. In this study, we report
a top-down TMT MS platform for confidently identifying and quantifying
low molecular weight intact proteoforms in complex biological samples.
To reduce the sample complexity and remove large proteins from complex
samples, we developed a filter-SEC technique that combines a molecular
weight cutoff filtration step with high-performance size exclusion
chromatography (SEC) separation. No protein precipitation was observed
in filtered samples under the intact protein-level TMT labeling conditions.
The proposed top-down TMT MS platform enables high-throughput analysis
of intact proteoforms, allowing for the identification and quantification
of hundreds of intact proteoforms from Escherichia coli cell lysates. To our knowledge, this represents the first high-throughput
TMT labeling-based, quantitative, top-down MS analysis suitable for
complex biological samples.
There are two approaches for de novo protein sequencing: Edman degradation and mass spectrometry (MS). Existing MS-based methods characterize a novel protein by assembling tandem mass spectra of overlapping peptides generated from multiple proteolytic digestions of the protein. Because each tandem mass spectrum covers only a short peptide of the target protein, the key to high coverage protein sequencing is to find spectral pairs from overlapping peptides in order to assemble tandem mass spectra to long ones. However, overlapping regions of peptides may be too short to be confidently identified. High-resolution mass spectrometers have become accessible to many laboratories. These mass spectrometers are capable of analyzing molecules of large mass values, boosting the development of top-down MS. Top-down tandem mass spectra cover whole proteins. However, top-down tandem mass spectra, even combined, rarely provide full ion fragmentation coverage of a protein. We propose an algorithm, TBNovo, for de novo protein sequencing by combining top-down and bottom-up MS. In TBNovo, a top-down tandem mass spectrum is utilized as a scaffold, and bottom-up tandem mass spectra are aligned to the scaffold to increase sequence coverage. Experiments on data sets of two proteins showed that TBNovo achieved high sequence coverage and high sequence accuracy.
Various proteoforms may be generated from a single gene due to primary structure alterations (PSAs) such as genetic variations, alternative splicing, and post-translational modifications (PTMs). Top-down mass spectrometry is capable of analyzing intact proteins and identifying patterns of multiple PSAs, making it the method of choice for studying complex proteoforms. In top-down proteomics, proteoform identification is often performed by searching tandem mass spectra against a protein sequence database that contains only one reference protein sequence for each gene or transcript variant in a proteome. Because of the incompleteness of the protein database, an identified proteoform may contain unknown PSAs compared with the reference sequence. Proteoform characterization is to identify and localize PSAs in a proteoform. Although many software tools have been proposed for proteoform identification by top-down mass spectrometry, the characterization of proteoforms in identified proteoform-spectrum-matches still relies mainly on manual annotation. We propose to use the Modification Identification Score (MIScore), which is based on Bayesian models, to automatically identify and localize PTMs in proteoforms. Experiments showed that the MIScore is accurate in identifying and localizing one or two modifications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.