Current T cell epitope prediction tools are a valuable resource in designing targeted immunogenicity experiments. They typically focus on, and are able to, accurately predict peptide binding and presentation by major histocompatibility complex (MHC) molecules on the surface of antigen-presenting cells. However, recognition of the peptide-MHC complex by a T cell receptor (TCR) is often not included in these tools. We developed a classification approach based on random forest classifiers to predict recognition of a peptide by a T cell receptor and discover patterns that contribute to recognition. We considered two approaches to solve this problem: (1) distinguishing between two sets of TCRs that each bind to a known peptide and (2) retrieving TCRs that bind to a given peptide from a large pool of TCRs. Evaluation of the models on two HIV-1, B*08-restricted epitopes reveals good performance and hints towards structural CDR3 features that can determine peptide immunogenicity. These results are of particular importance as they show that prediction of T cell epitope and T cell epitope recognition based on sequence data is a feasible approach. In addition, the validity of our models not only serves as a proof of concept for the prediction of immunogenic T cell epitopes but also paves the way for more general and high-performing models.
Despite the increasing importance of non-targeted metabolomics to answer various life science questions, extracting biochemically relevant information from metabolomics spectral data is still an incompletely solved problem. Most computational tools to identify tandem mass spectra focus on a limited set of molecules of interest. However, such tools are typically constrained by the availability of reference spectra or molecular databases, limiting their applicability of generating structural hypotheses for unknown metabolites. In contrast, recent advances in the field illustrate the possibility to expose the underlying biochemistry without relying on metabolite identification, in particular via substructure prediction. We describe an automated method for substructure recommendation motivated by association rule mining. Our framework captures potential relationships between spectral features and substructures learned from public spectral libraries. These associations are used to recommend substructures for any unknown mass spectrum. Our method does not require any predefined metabolite candidates, and therefore it can be used for the hypothesis generation or partial identification of unknown unknowns. The method is called MESSAR (MEtabolite Sub-Structure Auto-Recommender) and is implemented in a free online web service available at messar.biodatamining.be. OPEN ACCESS Citation: Liu Y, Mrzic A, Meysman P, De Vijlder T, Romijn EP, Valkenborg D, et al. (2020) MESSAR: Automated recommendation of metabolite substructures from tandem mass spectra. PLoS ONE 15(1): e0226770. https://doi.
Abstract:28 Current T-cell epitope prediction tools are a valuable resource in designing targeted immunogenicity 29 experiments. They typically focus on, and are able to, accurately predict peptide binding and presentation by 30 major histocompatibility complex (MHC) molecules on the surface of antigen-presenting cells. However, 31 recognition of the peptide-MHC complex by a T-cell receptor is often not included in these tools. We developed 32 a classification approach based on random forest classifiers to predict recognition of a peptide by a T-cell and 33 discover patterns that contribute to recognition. We considered two approaches to solve this problem: (1) 34 distinguishing between two sets of T-cell receptors that each bind to a known peptide and (2) retrieving T-cell 35 receptors that bind to a given peptide from a large pool of T-cell receptors. Evaluation of the models on two 36 HIV-1, B*08-restricted epitopes reveals good performance and hints towards structural CDR3 features that can 37 determine peptide immunogenicity. These results are of particularly importance as they show that prediction of 38 T-cell epitope and T-cell epitope recognition based on sequence data is a feasible approach. In addition, the 39 validity of our models not only serves as a proof of concept for the prediction of immunogenic T-cell epitopes 40 but also paves the way for more general and high performing models.
Searching for interesting common subgraphs in graph data is a well-studied problem in data mining. Subgraph mining techniques focus on the discovery of patterns in graphs that exhibit a specific network structure that is deemed interesting within these data sets. The definition of which subgraphs are interesting and which are not is highly dependent on the application. These techniques have seen numerous applications and are able to tackle a range of biological research questions, spanning from the detection of common substructures in sets of biomolecular compounds, to the discovery of network motifs in large-scale molecular interaction networks. Thus far, information about the bioinformatics application of subgraph mining remains scattered over heterogeneous literature. In this review, we provide an introduction to subgraph mining for life scientists. We give an overview of various subgraph mining algorithms from a bioinformatics perspective and present several of their potential biomedical applications.Electronic supplementary materialThe online version of this article (10.1186/s13040-018-0181-9) contains supplementary material, which is available to authorized users.
Here we present a method that incorporates a classic liquid chromatography/mass spectrometry (LC/MS) workflow with fragmentation models and computational algorithms. The assumptions upon which the concept of the method was built were shown to be valid and the method showed that in-source fragmentation can be used to pinpoint structural similarities and indicate the occurrence of a modification.
BackgroundMass spectrometry-based proteomics experiments generate spectra that are rich in information. Often only a fraction of this information is used for peptide/protein identification, whereas a significant proportion of the peaks in a spectrum remain unexplained. In this paper we explore how a specific class of data mining techniques termed “frequent itemset mining” can be employed to discover patterns in the unassigned data, and how such patterns can help us interpret the origin of the unexpected/unexplained peaks.ResultsFirst a model is proposed that describes the origin of the observed peaks in a mass spectrum. For this purpose we use the classical correlative database search algorithm. Peaks that support a positive identification of the spectrum are termed explained peaks. Next, frequent itemset mining techniques are introduced to infer which unexplained peaks are associated in a spectrum. The method is validated on two types of experimental proteomic data. First, peptide mass fingerprint data is analyzed to explain the unassigned peaks in a full scan mass spectrum. Interestingly, a large numbers of experimental spectra reveals several highly frequent unexplained masses, and pattern mining on these frequent masses demonstrates that subsets of these peaks frequently co-occur. Further evaluation shows that several of these co-occurring peaks indeed have a known common origin, and other patterns are promising hypothesis generators for further analysis. Second, the proposed methodology is validated on tandem mass spectrometral data using a public spectral library, where associations within the mass differences of unassigned peaks and peptide modifications are explored. The investigation of the found patterns illustrates that meaningful patterns can be discovered that can be explained by features of the employed technology and found modifications.ConclusionsThis simple approach offers opportunities to monitor accumulating unexplained mass spectrometry data for emerging new patterns, with possible applications for the development of mass exclusion lists, for the refinement of quality control strategies and for a further interpretation of unexplained spectral peaks in mass spectrometry and tandem mass spectrometry.Electronic supplementary materialThe online version of this article (doi:10.1186/s12953-014-0054-1) contains supplementary material, which is available to authorized users.
Despite the increasing importance of metabolomics approaches, the structural elucidation of metabolites from mass spectral data remains a challenge.Although several reliable tools to identify known metabolites exist, identifying compounds that have not been previously seen is a challenging task that still eludes modern bioinformatics tools. Here, we describe an automated method for substructure recommendation from mass spectra using pattern mining techniques. Based on previously seen recurring substructures our approach succeeds in identifying parts of unknown metabolites. An important advantage of this approach is that it does not require any prior informationNC 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint . http://dx.doi.org/10.1101/134189 doi: bioRxiv preprint first posted online May. 4, 2017; concerning the metabolites to be identified, and therefore it can be used for the (partial) identification of unknown unknowns. Using association rule mining we are able to recommend valid substructures even for those metabolites for which no match can be found in spectral libraries or structural databases. We further demonstrate how this approach is complementary to existing metabolite identification tools, achieving improved identification results. The method is called MESSAR (MEtabolite SubStructure AutoRecommender) and is implemented as a free online web service available at
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.