This paper constructs the framework of the reproducing kernel Hilbert space for multiple kernel learning, which provides clear insights into the reason that multiple kernel support vector machines (SVM) outperform single kernel SVM. These results can serve as a fundamental guide to account for the superiority of multiple kernel to single kernel learning. Subsequently, the constructed multiple kernel learning algorithms are applied to model a nonlinear blast furnace system only based on its input-output signals. The experimental results not only confirm the superiority of multiple kernel learning algorithms, but also indicate that multiple kernel SVM is a kind of highly competitive data-driven modeling method for the blast furnace system and can provide reliable indication for blast furnace operators to take control actions. Note to Practitioners-This paper is motivated by the problem of predicting the silicon content in blast furnace hot metal, whichis an open problem for realizing blast furnace automation. Here, based on the single kernel and multiple kernel SVM, we pay special attention to the silicon trend prediction since it can provide more direct guideline for taking control action in the blast furnace operation. Theoretically, we have given the detailed reasons that multiple kernel SVM is superior to single kernel SVM, which can improve the transparency of multiple kernel learning algorithm. The experimental results, not only confirm the superiority of multiple kernel learning algorithms, but also indicate that multiple kernel SVM is a kind of highly competitive data-driven modeling method for the blast furnace system and can provide reliable indication for blast furnace operators to take control actions.Index Terms-Data-driven, multiple kernel support vector machines (SVM), nonlinear blast furnace system, quadratically constrained quadratic programming, reproducing kernel Hilbert space.
Liquid chromatography coupled with tandem mass spectrometry has revolutionized the proteomics analysis of complexes, cells, and tissues. In a typical proteomic analysis, the tandem mass spectra from a LC/MS/MS experiment are assigned to a peptide by a search engine that compares the experimental MS/MS peptide data to theoretical peptide sequences in a protein database. The peptide spectra matches are then used to infer a list of identified proteins in the original sample. However, the search engines often fail to distinguish between correct and incorrect peptides assignments. In this study, we designed and implemented a novel algorithm called De-Noise to reduce the number of incorrect peptide matches and maximize the number of correct peptides at a fixed false discovery rate using a minimal number of scoring outputs from the SEQUEST search engine. The novel algorithm uses a three step process: data cleaning, data refining through a SVM-based decision function, and a final data refining step based on proteolytic peptide patterns. Using proteomics data generated on different types of mass spectrometers, we optimized the De-Noise algorithm based on the resolution and mass accuracy of the mass spectrometer employed in the LC/MS/MS experiment. Our results demonstrate De-Noise improves peptide identification compared to other methods used to process the peptide sequence matches assigned by SEQUEST. Because De-Noise uses a limited number of scoring attributes, it can be easily implemented with other search engines.
BackgroundPeptide sequence assignment is the central task in protein identification with MS/MS-based strategies. Although a number of post-database search algorithms for filtering target peptide spectrum matches (PSMs) have been developed, the discrepancy among the output PSMs is usually significant, remaining a few disputable PSMs. Current studies show that a number of target PSMs which are close to decoy PSMs can hardly be separated from those decoys by only using the discrimination function.ResultsIn this paper, we assign each target PSM a weight showing its possibility of being correct. We employ a SVM-based learning model to search the optimal weight for each target PSM and develop a new score system, CRanker, to rank all target PSMs. Due to the large PSM datasets generated in routine database searches, we use the Cholesky factorization technique for storing a kernel matrix to reduce the memory requirement.ConclusionsCompared with PeptideProphet and Percolator, CRanker has identified more PSMs under similar false discover rates over different datasets. CRanker has shown consistent performance on different test sets, validated the reasonability the proposed model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.