Abstract:BackgroundTop-down homogeneous multiplexed tandem mass (HomMTM) spectra are generated from modified proteoforms of the same protein with different post-translational modification patterns. They are frequently observed in the analysis of ultramodified proteins, some proteoforms of which have similar molecular weights and cannot be well separated by liquid chromatography in mass spectrometry analysis.ResultsWe formulate the top-down HomMTM spectral identification problem as the minimum error k-splittable flow pr… Show more
“…Characterization of proteoforms with PTMs or unknown mass shifts is also a challenging problem in TD-DIA-MS. For multiplexed DIA MS/MS spectra containing fragment ions from two proteoforms of two different proteins, fragment ions of the second proteoform may introduce errors in the characterization of the first proteoform. When a multiplexed MS/MS spectrum is generated from two proteoforms of the same protein, many fragment ions in the spectrum are shared by the two proteoforms, and proteoform characterization relies on only fragment ions that are unique for each proteoform 52 .…”
Top-down mass spectrometry is widely used for proteoform identification, characterization, and quantification owing to its ability to analyze intact proteoforms. In the last decade, top-down proteomics has been dominated by top-down data-dependent acquisition mass spectrometry (TD-DDA-MS), and top-down data-independent acquisition mass spectrometry (TD-DIA-MS) has not been well studied. While TD-DIA-MS produces complex multiplexed tandem mass spectrometry (MS/MS) spectra, which are challenging to confidently identify, it selects more precursor ions for MS/MS analysis and has the potential to increase proteoform identifications compared with TD-DDA-MS. Here we present TopDIA, the first software tool for proteoform identification by TD-DIA-MS. It generates demultiplexed pseudo MS/MS spectra from TD-DIA-MS data and then searches the pseudo MS/MS spectra against a protein sequence database for proteoform identification. We compared the performance of TD-DDA-MS and TD-DIA-MS usingEscherichia coliK-12 MG1655 cells and demonstrated that TD-DIA-MS with TopDIA increased proteoform and protein identifications compared with TD-DDA-MS.
“…Characterization of proteoforms with PTMs or unknown mass shifts is also a challenging problem in TD-DIA-MS. For multiplexed DIA MS/MS spectra containing fragment ions from two proteoforms of two different proteins, fragment ions of the second proteoform may introduce errors in the characterization of the first proteoform. When a multiplexed MS/MS spectrum is generated from two proteoforms of the same protein, many fragment ions in the spectrum are shared by the two proteoforms, and proteoform characterization relies on only fragment ions that are unique for each proteoform 52 .…”
Top-down mass spectrometry is widely used for proteoform identification, characterization, and quantification owing to its ability to analyze intact proteoforms. In the last decade, top-down proteomics has been dominated by top-down data-dependent acquisition mass spectrometry (TD-DDA-MS), and top-down data-independent acquisition mass spectrometry (TD-DIA-MS) has not been well studied. While TD-DIA-MS produces complex multiplexed tandem mass spectrometry (MS/MS) spectra, which are challenging to confidently identify, it selects more precursor ions for MS/MS analysis and has the potential to increase proteoform identifications compared with TD-DDA-MS. Here we present TopDIA, the first software tool for proteoform identification by TD-DIA-MS. It generates demultiplexed pseudo MS/MS spectra from TD-DIA-MS data and then searches the pseudo MS/MS spectra against a protein sequence database for proteoform identification. We compared the performance of TD-DDA-MS and TD-DIA-MS usingEscherichia coliK-12 MG1655 cells and demonstrated that TD-DIA-MS with TopDIA increased proteoform and protein identifications compared with TD-DDA-MS.
“…While we focused on the incorrect precursor masses due to unsuccessful deconvolution, it should be also noted that other sources of incorrect precursor mass assignment exist, in particular in TDP. Again, due to complex MS1 signal structure, it is well known that peaks from different proteoforms often coexist within the isolation window [5,55],…”
Section: Discussionmentioning
confidence: 99%
“…While we focused on the incorrect precursor masses due to unsuccessful deconvolution, it should be also noted that other sources of incorrect precursor mass assignment exist, in particular in TDP. Again, due to complex MS1 signal structure, it is well known that peaks from different proteoforms often coexist within the isolation window [5, 55], which is another source of incorrect precursor mass assignment. Our analysis of the chimeric spectra also confirms that up to 8.2% of identified spectra may be chimeric.…”
Top‐down proteomics (TDP) directly analyzes intact proteins and thus provides more comprehensive qualitative and quantitative proteoform‐level information than conventional bottom‐up proteomics (BUP) that relies on digested peptides and protein inference. While significant advancements have been made in TDP in sample preparation, separation, instrumentation, and data analysis, reliable and reproducible data analysis still remains one of the major bottlenecks in TDP. A key step for robust data analysis is the establishment of an objective estimation of proteoform‐level false discovery rate (FDR) in proteoform identification. The most widely used FDR estimation scheme is based on the target‐decoy approach (TDA), which has primarily been established for BUP. We present evidence that the TDA‐based FDR estimation may not work at the proteoform‐level due to an overlooked factor, namely the erroneous deconvolution of precursor masses, which leads to incorrect FDR estimation. We argue that the conventional TDA‐based FDR in proteoform identification is in fact protein‐level FDR rather than proteoform‐level FDR unless precursor deconvolution error rate is taken into account. To address this issue, we propose a formula to correct for proteoform‐level FDR bias by combining TDA‐based FDR and precursor deconvolution error rate.
“…Such spectral complexity makes quantitation by simple comparison of the fragment-ion relative ratio (FIRR) , difficult. To tackle such challenges, more advanced algorithms, such as the mixed-integer linear optimization algorithm proposed by DiMaggio et al and the graph-based algorithm proposed by Zhu and Liu, have been introduced to calculate the abundance ratio of proteoforms that could best fit the observed spectrum. Since bioinformatic tools based on these algorithms are not publicly available and constructing one from scratch is beyond our expertise, we did not attempt to conduct quantitative analysis on the intact histone H3 proteoforms.…”
The
heterogeneity of histone H3 proteoforms makes histone H3 top-down
analysis challenging. To enhance the detection coverage of the proteoforms,
performing liquid chromatography (LC) front-end to mass spectrometry
(MS) detection is recommended. Here, using optimized electron-transfer/high-energy
collision dissociation (EThcD) parameters, we have conducted a proteoform-spectrum
match (PrSM)-level side-by-side comparison of reversed-phase LC-MS
(RPLC-MS), “dual-gradient” weak cation-exchange/hydrophilic
interaction LC-MS (dual-gradient WCX/HILIC-MS), and “organic-rich”
WCX/HILIC-MS on the top-down analyses of H3.1, H3.2, and H4 proteins
extracted from a HeLa cell culture. While both dual-gradient WCX/HILIC
and organic-rich WCX/HILIC could resolve intact H3 and H4 proteoforms
by the number of acetylations, the organic-rich method could enhance
the separations of different trimethyl/acetyl near-isobaric H3 proteoforms.
In comparison with RPLC-MS, both of the WCX/HILIC-MS methods enhanced
the qualities of the H3 PrSMs and remarkably improved the range, reproducibility,
and confidence in the identifications of H3 proteoforms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.