Protein identification by tandem mass spectrometry is based on the reliable processing of the acquired data. Unfortunately, the generation of a large number of poor quality spectra is commonly observed in LC-MS/MS, and the processing of these mostly noninformative spectra with its associated costs should be avoided. We present a continuous quality score that can be computed very quickly and that can be considered an approximation of the MASCOT score in case of a correct identification. This score can be used to reject low quality spectra prior to database identification, or to draw attention to those spectra that exhibit a (supposedly) high information content, but could not be identified. The proposed quality score can be calibrated automatically on site without the need for a manually generated training set. When this score is turned into a classifier and when features are used that are independent of the instrument, the proposed approach performs equally to previously published classifiers and feature sets and also gives insights into the behavior of the MASCOT score.
Knowledge of the biologically relevant components of human tissues has enabled the invention of numerous clinically useful diagnostic tests, as well as non-invasive ways of monitoring disease and its response to treatment. Recent use of advanced MS-based proteomics revealed that the composition of human urine is more complex than anticipated. Here, we extend the current characterization of the human urinary proteome by extensively fractionating urine using ultracentrifugation, gel electrophoresis, ion exchange and reverse-phase chromatography, effectively reducing mixture complexity while minimizing loss of material. By using high-accuracy mass measurements of the linear ion trap-Orbitrap mass spectrometer and LC-MS/MS of peptides generated from such extensively fractionated specimens, we identified 2362 proteins in routinely collected individual urine specimens, including more than 1000 proteins not described in previous studies. Many of these are biomedically significant molecules, including glomerularly filtered cytokines and shed cell surface molecules, as well as renally and urogenitally produced transporters and structural proteins. Annotation of the identified proteome reveals distinct patterns of enrichment, consistent with previously described specific physiologic mechanisms, including 336 proteins that appear to be expressed by a variety of distal organs and glomerularly filtered from serum. Comparison of the proteomes identified from 12 individual specimens revealed a subset of generally invariant proteins, as well as individually variable ones, suggesting that our approach may be used to study individual differences in age, physiologic state and clinical condition. Consistent with this, annotation of the identified proteome by using machine learning and text mining exposed possible associations with 27 common and more than 500 rare human diseases, establishing a widely useful resource for the study of human pathophysiology and biomarker discovery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.