The identification
of proteins below approximately 70–100
amino acids in bottom-up proteomics is still a challenging task due
to the limited number of peptides generated by proteolytic digestion.
This includes the short open reading frame-encoded peptides (SEPs),
which are a subset of the small proteins that were not previously
annotated or that are alternatively encoded. Here, we systematically
investigated the use of multiple proteases (trypsin, chymotrypsin,
LysC, LysargiNase, and GluC) in GeLC–MS/MS analysis to improve
the sequence coverage and the number of identified peptides for small
proteins, with a focus on SEPs, in the archaeon Methanosarcina
mazei. Combining the data of all proteases, we identified
63 small proteins and additional 28 SEPs with at least two unique
peptides, while only 55 small proteins and 22 SEP could be identified
using trypsin only. For 27 small proteins and 12 SEPs, a complete
sequence coverage was achieved. Moreover, for five SEPs, incorrectly
predicted translation start points or potential in vivo proteolytic processing were identified, confirming the data of a
previous top-down proteomics study of this organism. The results show
clearly that a multi-protease approach allows to improve the identification
and molecular characterization of small proteins and SEPs. LC–MS
data: ProteomeXchange PXD023921.