The identification of small proteins and peptides (below ca. 100−150 amino acids) in complex biological samples is hampered by the dominance of higher-molecular-weight proteins. On the contrary, the increasing knowledge about alternative or short open reading frames creates a need for methods that allow the existence of the corresponding gene products to be proven in proteomics experiments. We present an acetonitrile-based precipitation methodology that depletes the majority of proteins above ca. 15 kDa. Parameters such as depletion mixture composition, pH, and temperature were optimized using a model protein mixture, and the method was evaluated in comparison with the established differential solubility method. The approach was applied to the analysis of the low-molecular-weight proteome of the archaea Methanosarcina mazei by means of LC−MS. The data clearly show a beneficial effect from a reduction of complexity, especially in terms of the quality of MS/MS-based identification of small proteins. This fast, detergent-free method allowed for, with minimal sample manipulation, the successful identification of several not yet identified short open reading frame encoded peptides in M. mazei.
In top-down (TD)
proteomics, prefractionation prior to mass spectrometric
(MS) analysis is a crucial step for both the high confidence identification
of proteoforms and increased proteome coverage. In addition to liquid-phase
separations, gas-phase fractionation strategies such as field asymmetric
ion mobility spectrometry (FAIMS) have been shown to be highly beneficial
in TD proteomics. However, so far, only external compensation voltage
(CV) stepping has been demonstrated for TD proteomics, i.e., single
CVs were applied for each run. Here, we investigated the use of internal
CV stepping (multiple CVs per acquisition) for single-shot TD analysis,
which has huge advantages in terms of measurement time and the amount
of sample required. In addition, MS parameters were optimized for
the individual CVs since different CVs target certain mass ranges.
For example, small proteoforms identified mainly with more negative
CVs can be identified with lower resolution and number of microscans
than larger proteins identified primarily via less negative CVs. We
investigated the optimal combination and number of CVs for different
gradient lengths and validated the optimized settings with the low-molecular-weight
proteome of CaCo-2 cells obtained using a range of different sample
preparation techniques. Compared to measurements without FAIMS, both
the number of identified protein groups (+60–94%) and proteoforms
(+46–127%) and their confidence were significantly increased,
while the measurement time remained identical. In total, we identified
684 protein groups and 2675 proteoforms from CaCo-2 cells in less
than 24 h using the optimized multi-CV method.
The identification
of proteins below approximately 70–100
amino acids in bottom-up proteomics is still a challenging task due
to the limited number of peptides generated by proteolytic digestion.
This includes the short open reading frame-encoded peptides (SEPs),
which are a subset of the small proteins that were not previously
annotated or that are alternatively encoded. Here, we systematically
investigated the use of multiple proteases (trypsin, chymotrypsin,
LysC, LysargiNase, and GluC) in GeLC–MS/MS analysis to improve
the sequence coverage and the number of identified peptides for small
proteins, with a focus on SEPs, in the archaeon Methanosarcina
mazei. Combining the data of all proteases, we identified
63 small proteins and additional 28 SEPs with at least two unique
peptides, while only 55 small proteins and 22 SEP could be identified
using trypsin only. For 27 small proteins and 12 SEPs, a complete
sequence coverage was achieved. Moreover, for five SEPs, incorrectly
predicted translation start points or potential in vivo proteolytic processing were identified, confirming the data of a
previous top-down proteomics study of this organism. The results show
clearly that a multi-protease approach allows to improve the identification
and molecular characterization of small proteins and SEPs. LC–MS
data: ProteomeXchange PXD023921.
The recent discovery of alternative open reading frames creates a need for suitable analytical approaches to verify their translation and to characterize the corresponding gene products at the molecular level. As the analysis of small proteins within a background proteome by means of classical bottom-up proteomics is challenging, method development for the analysis of small open reading frame encoded peptides (SEPs) have become a focal point for research. Here, we highlight bottom-up and top-down proteomics approaches established for the analysis of SEPs in both pro-and eukaryotes. Major steps of analysis, including sample preparation and (small) proteome isolation, separation and mass spectrometry, data interpretation and quality control, quantification, the analysis of post-translational modifications, and exploration of functional aspects of the SEPs by means of proteomics technologies are described. These methods do not exclusively cover the analytics of SEPs but simultaneously include the low molecular weight proteome, and moreover, can also be used for the proteome-wide analysis of proteolytic processing events.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.