Protein identification by tandem mass spectrometry (MS/MS) is key to most proteomics projects and has been widely explored in bioinformatics research. Obtaining good and trustful identification results has important implications for biological and clinical work. Although well matured, automated software identification of proteins from MS/MS data still faces a number of obstacles due to the complexity of the proteome or procedural issues of mass spectrometry data acquisition. Expected or unexpected modifications of the peptide sequences, polymorphisms, errors in databases, missed or non-specific cleavages, unusual fragmentation patterns, and single MS/MS spectra of multiple peptides of the same m/z are so many pitfalls for identification algorithms. A lot of research work has been carried out in recent years that yielded new strategies to handle a number of these issues. Multiple MS/MS identification algorithms are now available or have been theoretically described. The difficulty resides in choosing the most adapted method for each type of spectra being identified. This review presents an overview of the state-of-the-art bioinformatics approaches to the identification of proteins by MS/MS to help the reader doing the spade work of finding the right tools among the many possibilities offered.
Protein-protein interactions are key to function and regulation of many biological pathways. To facilitate characterization of protein-protein interactions using mass spectrometry, a new data acquisition/analysis pipeline was designed. The goal for this pipeline was to provide a generic strategy for identifying crosslinked peptides from single LC/MS/MS datasets, without using specialized crosslinkers or custom-written software. To achieve this, each peptide in the pair of crosslinked peptides was considered to be “post-translationally” modified with an unknown mass at an unknown amino acid. This allowed use of an open-modification search engine, Popitam, to interpret the tandem mass spectra of crosslinked peptides. False positives were reduced and database selectivity increased by acquiring precursors and fragments at high mass accuracy. Additionally, a high-charge-state-driven data acquisition scheme was utilized to enrich datasets for crosslinked peptides. This open-modification search based pipeline was shown to be useful for characterizing both chemical as well as native crosslinks in proteins. The pipeline was validated by characterizing the known interactions in chemically crosslinked CYP2E1-b5 complex. Utility of this method in identifying native crosslinks was demonstrated by mapping disulfide bridges in RcsF, an outer membrane lipoprotein involved in Rcs phosphorelay.
Bioinformatics tools for proteomics, also called proteome informatics tools, span today a large panel of very diverse applications ranging from simple tools to compare protein amino acid compositions to sophisticated software for large-scale protein structure determination. This review considers the available and ready to use tools that can help end-users to interpret, validate and generate biological information from their experimental data. It concentrates on bioinformatics tools for 2-DE analysis, for LC followed by MS analysis, for protein identification by PMF, by peptide fragment fingerprinting and by de novo sequencing and for data quantitation with MS data. It also discloses initiatives that propose to automate the processes of MS analysis and enhance the quality of the obtained results.
In recent years, proteomics research has gained importance due to increasingly powerful techniques in protein purification, mass spectrometry and identification, and due to the development of extensive protein and DNA databases from various organisms. Nevertheless, current identification methods from spectrometric data have difficulties in handling modifications or mutations in the source peptide. Moreover, they have low performance when run on large databases (such as genomic databases), or with low quality data, for example due to bad calibration or low fragmentation of the source peptide. We present a new algorithm dedicated to automated protein identification from tandem mass spectrometry (MS/MS) data by searching a peptide sequence database. Our identification approach shows promising properties for solving the specific difficulties enumerated above. It consists of matching theoretical peptide sequences issued from a database with a structured representation of the source MS/MS spectrum. The representation is similar to the spectrum graphs commonly used by de novo sequencing software. The identification process involves the parsing of the graph in order to emphasize relevant sections for each theoretical sequence, and leads to a list of peptides ranked by a correlation score. The parsing of the graph, which can be a highly combinatorial task, is performed by a bio-inspired algorithm called Ant Colony Optimization algorithm.
The advantages and disadvantages of acquiring tandem mass spectra by collision-induced dissociation (CID) of peptides in linear ion trap Fourier-transform hybrid instruments are described. These instruments offer the possibility to transfer fragment ions from the linear ion trap to the FT-based analyzer for analysis with both high resolution and high mass accuracy. In addition, performing CID during the transfer of ions from the linear ion trap (LTQ) to the FT analyzer is also possible in instruments containing an additional collision cell (i.e., the "C-trap" in the LTQ-Orbitrap), resulting in tandem mass spectra over the full m/z range and not limited by the ejection q value of the LTQ. Our results show that these scan modes have lower duty cycles than tandem mass spectra acquired in the LTQ with nominal mass resolution, and typically result in fewer peptide identifications during data-dependent analysis of complex samples. However, the higher measured mass accuracy and resolution provides more specificity and hence provides a lower false positive ratio for the same number of true positives during database search of peptide tandem mass spectra. In addition, the search for modified and unexpected peptides is greatly facilitated with this data acquisition mode. It is therefore concluded that acquisition of tandem mass spectral data with high measured mass accuracy and resolution is a competitive alternative to "classical" data acquisition strategies, especially in situations of complex searches from large databases, searches for modified peptides, or for peptides resulting from unspecific cleavages. analyzers were introduced. These instruments greatly expand opportunities for experimental design because they combine two mass analyzers working in series. Also important to note is that even with external calibration, they deliver mass accuracies in the 2-5 ppm range on a routine basis. Furthermore, even better mass accuracies down to 1-2 ppm have been demonstrated by using the mass difference between adjacent peptide fragment ions in the FT-ICR analyzer [3] as well as injection of a normalized, stable amount of calibrant into the Orbitrap [4] analyzer. The mass accuracy of these instruments is therefore similar to the prior generations sector instruments used at the beginning of the era of mass spectrometry based peptide sequencing [5], but now at unprecedented sensitivity and data acquisition rates. While the community has embraced these new instrument platforms capable of very high mass accuracy, little attention has been paid to what is actually gained by this additional information, especially when acquired on fragment ions.Unlike other hybrid mass spectrometers such as Q-TOFs [6] or Q-FTICRs [7], which have a single detector, the LTQ-FT and LTQ-OT instruments allow parallel data acquisition in both mass analyzers by use of dual detectors. This silent revolution in mass analyzers opens up new data acquisition schemes. Typically, the FT-based analyzer performs a survey scan of MS1 ions while the linear ion trap ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.