Methodologies to enrich heterogeneous types of phosphopeptides are critical for comprehensive mapping of the under-explored phosphoproteome. Taking advantage of the distinct binding affinities of Ga(3+) and Fe(3+) for phosphopeptides, we designed a metal-directed immobilized metal ion affinity chromatography for the sequential enrichment of phosphopeptides. In Raji B cells, the sequential Ga(3+)-Fe(3+)-immobilized metal affinity chromatography (IMAC) strategy displayed a 1.5-3.5-fold superior phosphoproteomic coverage compared to single IMAC (Fe(3+), Ti(4+), Ga(3+), and Al(3+)). In addition, up to 92% of the 6283 phosphopeptides were uniquely enriched in either the first Ga(3+)-IMAC (41%) or second Fe(3+)-IMAC (51%). The complementary properties of Ga(3+) and Fe(3+) were further demonstrated through the exclusive enrichment of almost all of 1214 multiply phosphorylated peptides (99.4%) in the Ga(3+)-IMAC, whereas only 10% of 5069 monophosphorylated phosphopeptides were commonly enriched in both fractions. The application of sequential Ga(3+)-Fe(3+)-IMAC to human lung cancer tissue allowed the identification of 2560 unique phosphopeptides with only 8% overlap. In addition to the above-mentioned mono- and multiply phosphorylated peptides, this fractionation ability was also demonstrated on the basic and acidic phosphopeptides: acidophilic phosphorylation sites were predominately enriched in the first Ga(3+)-IMAC (72%), while Pro-directed (85%) and basophilic (79%) phosphorylation sites were enriched in the second Fe(3+)-IMAC. This strategy provided complementary mapping of different kinase substrates in multiple cellular pathways related to cancer invasion and metastasis of lung cancer. Given the fractionation ability and ease of tip preparation of this Ga(3+)-Fe(3+)-IMAC, we propose that this strategy allows more comprehensive characterization of the phosphoproteome both in vitro and in vivo.
N-linked glycosylation is one of the predominant post-translational modifications involved in a number of biological functions. Since experimental characterization of glycosites is challenging, glycosite prediction is crucial. Several predictors have been made available and report high performance. Most of them evaluate their performance at every asparagine in protein sequences, not confined to asparagine in the N-X-S/T sequon. In this paper, we present N-GlyDE, a two-stage prediction tool trained on rigorously-constructed non-redundant datasets to predict N-linked glycosites in the human proteome. The first stage uses a protein similarity voting algorithm trained on both glycoproteins and non-glycoproteins to predict a score for a protein to improve glycosite prediction. The second stage uses a support vector machine to predict N-linked glycosites by utilizing features of gapped dipeptides, pattern-based predicted surface accessibility, and predicted secondary structure. N-GlyDE’s final predictions are derived from a weight adjustment of the second-stage prediction results based on the first-stage prediction score. Evaluated on N-X-S/T sequons of an independent dataset comprised of 53 glycoproteins and 33 non-glycoproteins, N-GlyDE achieves an accuracy and MCC of 0.740 and 0.499, respectively, outperforming the compared tools. The N-GlyDE web server is available at http://bioapp.iis.sinica.edu.tw/N-GlyDE/.
Membrane proteins are crucial targets for cancer biomarker discovery and drug development. However, in addition to the inherent challenges of hydrophobicity and low abundance, complete membrane proteome coverage of clinical specimen is usually hindered by the requirement of large amount of starting materials. Toward comprehensive membrane proteomic profiling for small amounts of samples (10 μg), we developed high-pH reverse phase (Hp-RP) combined with stop-and-go extraction tip (StageTip) technique, as a fast (∼15 min.), sensitive, reproducible, high-resolution and multiplexed fractionation method suitable for accurate quantification of the membrane proteome. This approach provided almost 2-fold enhanced detection of peptides encompassing transmembrane helix (TMH) domain, as compared with strong anion exchange (SAX) and strong cation exchange (SCX) StageTip techniques. Almost 5000 proteins (∼60% membrane proteins) can be identified in only 10 μg of membrane protein digests, showing the superior sensitivity of the Hp-RP StageTip approach. The method allowed up to 9- and 6-fold increase in the identification of unique hydrophobic and hydrophilic peptides, respectively. The Hp-RP StageTip method enabled in-depth membrane proteome profiling of 11 lung cancer cell lines harboring different EGFR mutation status, which resulted in the identification of 3983 annotated membrane proteins. This provides the largest collection of reference peptide spectral data for lung cancer membrane subproteome. Finally, relative quantification of membrane proteins between Gefitinib-resistant and -sensitive lung cancer cell lines revealed several up-regulated membrane proteins with key roles in lung cancer progression.
Phosphoproteomics can provide insights into cellular signaling dynamics. To achieve deep and robust quantitative phosphoproteomics profiling for minute amounts of sample, we here develop a global phosphoproteomics strategy based on data-independent acquisition (DIA) mass spectrometry and hybrid spectral libraries derived from data-dependent acquisition (DDA) and DIA data. Benchmarking the method using 166 synthetic phosphopeptides shows high sensitivity (<0.1 ng), accurate site localization and reproducible quantification (~5% median coefficient of variation). As a proof-of-concept, we use lung cancer cell lines and patient-derived tissue to construct a hybrid phosphoproteome spectral library covering 159,524 phosphopeptides (88,107 phosphosites). Based on this library, our single-shot streamlined DIA workflow quantifies 36,350 phosphosites (19,755 class 1) in cell line samples within two hours. Application to drug-resistant cells and patient-derived lung cancer tissues delineates site-specific phosphorylation events associated with resistance and tumor progression, showing that our workflow enables the characterization of phosphorylation signaling with deep coverage, high sensitivity and low between-run missing values.
Despite significant efforts in the past decade towards complete mapping of the human proteome, 3564 proteins (neXtProt, 09-2014) are still “missing proteins”. Over one-third of these missing proteins are annotated as membrane proteins, owing to their relatively challenging accessibility with standard shotgun proteomics. Using non-small cell lung cancer (NSCLC) as a model study, we aim to mine missing proteins from disease-associated membrane proteome, which may be still largely under-represented. To increase identification coverage, we employed Hp-RP StageTip pre-fractionation of membrane-enriched samples from 11 NSCLC cell lines. Analysis of membrane samples from 20 pairs of tumor and adjacent normal lung tissue were incorporated to include physiologically expressed membrane proteins. Using multiple search engines (X!Tandem, Comet and Mascot) and stringent evaluation of FDR (MAYU and PeptideShaker), we identified 7702 proteins (66% membrane proteins) and 178 missing proteins (74 membrane proteins) with PSM-, peptide-, and protein-level FDR of 1%. Through multiple reaction monitoring (MRM) using synthetic peptides, we provided additional evidences for 8 missing proteins including 7 with transmembrane helix domains (TMH). This study demonstrates that mining missing proteins focused on cancer membrane sub-proteome can greatly contribute to map the whole human proteome. All data were deposited into ProteomeXchange with the identifier PXD002224.
Although EGFR tyrosine kinase inhibitors (TKIs) have demonstrated good efficacy in non-small-cell lung cancer (NSCLC) patients harboring EGFR mutations, most patients develop intrinsic and acquired resistance. We quantitatively profiled the phosphoproteome and proteome of drug-sensitive and drug-resistant NSCLC cells under gefitinib treatment. The construction of a dose-dependent responsive kinase-substrate network of 1548 phosphoproteins and 3834 proteins revealed CK2-centric modules as the dominant core network for the potential gefitinib resistance-associated proteins. CK2 knockdown decreased cell survival in gefitinib-resistant NSCLCs. Using motif analysis to identify the CK2 core sub-network, we verified that elevated phosphorylation level of a CK2 substrate, HMGA1 was a critical node contributing to EGFR-TKI resistance in NSCLC cell. Both HMGA1 knockdown or mutation of the CK2 phosphorylation site, S102, of HMGA1 reinforced the efficacy of gefitinib in resistant NSCLC cells through reactivation of the downstream signaling of EGFR. Our results delineate the TKI resistance-associated kinase-substrate network, suggesting a potential therapeutic strategy for overcoming TKI-induced resistance in NSCLC.
Human embryonic stem cells (hESCs) have the capacity for self-renewal and multilineage differentiation, which are of clinical importance for regeneration medicine. Despite the significant progress of hESC study, the complete hESC proteome atlas, especially the surface protein composition, awaits delineation. According to the latest release of neXtProt database (January 17, 2018; 19 658 PE1, 2, 3, and 4 human proteins), membrane proteins present the major category (1047; 48%) among all 2186 missing proteins (MPs). We conducted a deep subcellular proteomics analysis of hESCs to identify the nuclear, cytoplasmic, and membrane proteins in hESCs and to mine missing membrane proteins in the very early cell status. To our knowledge, our study achieved the largest data set with confident identification of 11 970 unique proteins (1% false discovery rate at peptide, protein, and PSM levels), including the most-comprehensive description of 6 138 annotated membrane proteins in hESCs. Following the HPP guideline, we identified 26 gold (neXtProt PE2, 3, and 4 MPs) and 87 silver (potential MP candidates with a single unique peptide detected) MPs, of which 69 were membrane proteins, and the expression of 21 gold MPs was further verified either by multiple reaction monitoring mass spectrometry or by matching synthetic peptides in the Peptide Atlas database. Functional analysis of the MPs revealed their potential roles in the pluripotency-related pathways and the lineage- and tissue-specific differentiation processes. Our proteome map of hESCs may provide a rich resource not only for the identification of MPs in the human proteome but also for the investigation on self-renewal and differentiation of hESC. All mass spectrometry data were deposited in ProteomeXchange via jPOST with identifier PXD009840.
Protein experiment evidence at protein level from mass spectrometry and antibody experiments are essential to characterize the human proteome. neXtProt (2014-09 release) reported 20 055 human proteins, including 16 491 proteins identified at protein level and 3564 proteins unidentified. Excluding 616 proteins at uncertain level, 2948 proteins were regarded as missing proteins. Missing proteins were unidentified partially due to MS limitations and intrinsic properties of proteins, for example, only appearing in specific diseases or tissues. Despite such reasons, it is desirable to explore issues affecting validation of missing proteins from an "ideal" shotgun analysis of human proteome. We thus performed in silico digestions on the human proteins to generate all in silico fully digested peptides. With these presumed peptides, we investigated the identification of proteins without any unique peptide, the effect of sequence variants on protein identification, difficulties in identifying olfactory receptors, and highly similar proteins. Among all proteins with evidence at transcript level, G protein-coupled receptors and olfactory receptors, based on InterPro classification, were the largest families of proteins and exhibited more frequent variants. To identify missing proteins, the above analyses suggested including sequence variants in protein FASTA for database searching. Furthermore, evidence of unique peptides identified from MS experiments would be crucial for experimentally validating missing proteins.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.