We report O-Pair Search, a new approach to identify O-glycopeptides and localize O-glycosites. Using paired collision-and electron-based dissociation spectra, O-Pair Search identifies Oglycopeptides using an ion-indexed open modification search and localizes O-glycosites using graph theory and probability-based localization. O-Pair Search reduces search times more than 2,000-fold compared to current O-glycopeptide processing software, while defining O-glycosite localization confidence levels and generating more O-glycopeptide identifications. O-Pair Search is freely available: https://github.com/smith-chem-wisc/MetaMorpheus.
Main TextMass spectrometry (MS) is the gold standard for interrogating the glycoproteome, enabling the localization of glycans to specific glycosites. 1-3 Recent applications of electron-driven dissociation methods have shown promise in localizing modified O-glycosites even in multiply glycosylated peptides 4 . Yet, standard approaches for interpreting tandem MS spectra are ill-suited for the heterogeneity of O-glycopeptides. Perhaps the most challenging problem for Oglycopeptide analysis is mucin-type O-glycosylation, which is abundant on many extracellular and secreted proteins and is a crucial mediator of immune function, microbiome interaction, and biophysical forces imposed on cells, among others 5 . Mucin-type O-glycans are linked to serine and threonine residues through an initiating N-acetylgalactosamine (GalNAc) sugar, which can be further elaborated into four major core structures (cores 1-4) or remain truncated as terminal GalNAc (Tn) and sialyl-Tn antigens 6 . These O-glycosites occur most frequently in long serine/threonine rich sequences (Supplementary Fig. 1), such as PTS mucin tandem repeat domains, which exist with microheterogeneity defined by a large number of potential O-glycans 7 . The number of serine and threonine residues present in glycopeptides derived from mucin-type O-glycoproteins, combined with the consideration of dozens of potential O-glycans at each site, leads to a combinatorial explosion when generating databases of theoretical O-glycopeptides to consider for each tandem MS/MS spectrum (Supplementary Note 1).
AUTHOR INFORMATION Affiliations