High‐grade serous ovarian carcinoma (HGSOC) is the most common subtype of ovarian cancer with 5‐year survival rates below 40%. Neoadjuvant chemotherapy (NACT) followed by interval debulking surgery (IDS) is recommended for patients with advanced‐stage HGSOC unsuitable for primary debulking surgery (PDS). However, about 40% of patients receiving this treatment exhibited chemoresistance of uncertain molecular mechanisms and predictability. Here, we built a high‐quality ovary‐specific spectral library containing 130 735 peptides and 10 696 proteins on Orbitrap instruments. Compared to a published DIA pan‐human spectral library (DPHL), this spectral library provides 10% more ovary‐specific and 3% more ovary‐enriched proteins. This library was then applied to analyze data‐independent acquisition (DIA) data of tissue samples from an HGSOC cohort treated with NACT, leading to 10 070 quantified proteins, which is 9.73% more than that with DPHL. We further established a six‐protein classifier by parallel reaction monitoring (PRM) to effectively predict the resistance to additional chemotherapy after IDS (Log‐rank test, P = 0.002). The classifier was validated with 57 patients from an independent clinical center (P = 0.014). Thus, we have developed an ovary‐specific spectral library for targeted proteome analysis, and propose a six‐protein classifier that could potentially predict chemoresistance in HGSOC patients after NACT‐IDS treatment.
In the process of identifying phenotype-specific or differentially expressed proteins from proteomic data, a standard workflow consists of five key steps: raw data quantification, expression matrix construction, matrix normalization, missing data imputation, and differential expression analysis. However, due to the availability of multiple options at each step, selecting ad hoc combinations of options can result in suboptimal analysis. To address this, we conducted an extensive study involving 10,808 experiments to compare the performance of exhaustive option combinations for each step across 12 gold standard spike-in datasets and three quantification platforms: FragPipe, MaxQuant, and DIA-NN. By employing frequent pattern mining techniques on the data from these experiments, we discovered high-performing rules for selecting optimal workflows. These rules included avoiding normalization, utilizing MinProb for missing value imputation, and employing limma for differential expression analysis. We found that workflow performances were predictable and could be accurately categorized using average F1 scores and Matthew's correlation coefficients, both exceeding 0.79 in 10-fold cross-validations. Furthermore, by integrating the top-ranked workflows through ensemble inference, we not only improved the accuracy of differential expression analysis (e.g., achieving a 1-5% gain under five performance metrics for FragPipe), but also enhanced the workflow's ability to aggregate proteomic information across various levels, including peptide and protein level intensities and spectral counts, providing a comprehensive perspective on the data. Overall, our study highlights the importance of selecting optimal workflow combinations and demonstrates the benefits of ensemble inference in improving both the accuracy and comprehensiveness of proteomic data analysis.
CCOC s a relatively rare subtype of ovarian cancer with high degree of
resistance to standard chemotherapy. Little is known about the
underlying molecular mechanisms, and it remains a challenge to predict
its prognosis after chemotherapy. We analyzed the proteome of CCOC
tissue samples from two independent cohorts using DIA-MS. A total of
8697 proteins were characterized in the first cohort (H1 cohort, 32
patients, 35 FFPE samples) and 9409 proteins in the second cohort (H2
cohort, 24 patients, 28 FF samples). After bioinformatics analysis, we
narrowed our focus to 15 proteins significantly correlated with RFS in
both cohorts. These proteins are mainly involved in DNA damage response,
extracellular matrix, and mitochondrial metabolism. We further developed
a 13-protein model to predict the prognosis of patients with CCOC in H2
cohort, and validated the model in the H1 cohort in both DIA and PRM
data. Finally, we verified the modulated pathways from our CCOC
proteomic dataset in several published CCOC transcriptome and proteome
datasets. Taken together, this study presents a CCOC proteomic data
resource and a promising 13-protein panel which could potentially
predict the recurrence and survival of CCOC.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.