Traditional proteomics analysis is plagued by the use
of arbitrary
thresholds resulting in large loss of information. We propose here
a novel method in proteomics that utilizes all detected proteins.
We demonstrate its efficacy in a proteomics screen of 5 and 7 liver
cancer patients in the moderate and late stage, respectively. Utilizing
biological complexes as a cluster vector, and augmenting it with submodules
obtained from partitioning an integrated and cleaned protein–protein
interaction network, we calculate a Proteomics Signature Profile (PSP)
for each patient based on the hit rates of their reported proteins,
in the absence of fold change thresholds, against the cluster vector.
Using this, we demonstrated that moderate- and late-stage patients
segregate with high confidence. We also discovered a moderate-stage
patient who displayed a proteomics profile similar to other poor-stage
patients. We identified significant clusters using a modified version
of the SNet approach. Comparing our results against the Proteomics
Expansion Pipeline (PEP) on which the same patient data was analyzed,
we found good correlation. Building on this finding, we report significantly
more clusters (176 clusters here compared to 70 in PEP), demonstrating
the sensitivity of this approach. Gene Ontology (GO) terms analysis
also reveals that the significant clusters are functionally congruent
with the liver cancer phenotype. PSP is a powerful and sensitive method
for analyzing proteomics profiles even when sample sizes are small.
It does not rely on the ratio scores but, rather, whether a protein
is detected or not. Although consistency of individual proteins between
patients is low, we found the reported proteins tend to hit clusters
in a meaningful and informative manner. By extracting this information
in the form of a Proteomics Signature Profile, we confirm that this
information is conserved and can be used for (1) clustering of patient
samples, (2) identification of significant clusters based on real
biological complexes, and (3) overcoming consistency and coverage
issues prevalent in proteomics data sets.