Nonnegative matrix factorization (NMF)-based models possess fine representativeness of a target matrix, which is critically important in collaborative filtering (CF)-based recommender systems. However, current NMF-based CF recommenders suffer from the problem of high computational and storage complexity, as well as slow convergence rate, which prevents them from industrial usage in context of big data. To address these issues, this paper proposes an alternating direction method (ADM)-based nonnegative latent factor (ANLF) model. The main idea is to implement the ADM-based optimization with regard to each single feature, to obtain high convergence rate as well as low complexity. Both computational and storage costs of ANLF are linear with the size of given data in the target matrix, which ensures high efficiency when dealing with extremely sparse matrices usually seen in CF problems. As demonstrated by the experiments on large, real data sets, ANLF also ensures fast convergence and high prediction accuracy, as well as the maintenance of nonnegativity constraints. Moreover, it is simple and easy to implement for real applications of learning systems.
BackgroundProteins are the important molecules which participate in virtually every aspect of cellular function within an organism in pairs. Although high-throughput technologies have generated considerable protein-protein interactions (PPIs) data for various species, the processes of experimental methods are both time-consuming and expensive. In addition, they are usually associated with high rates of both false positive and false negative results. Accordingly, a number of computational approaches have been developed to effectively and accurately predict protein interactions. However, most of these methods typically perform worse when other biological data sources (e.g., protein structure information, protein domains, or gene neighborhoods information) are not available. Therefore, it is very urgent to develop effective computational methods for prediction of PPIs solely using protein sequence information.ResultsIn this study, we present a novel computational model combining weighted sparse representation based classifier (WSRC) and global encoding (GE) of amino acid sequence. Two kinds of protein descriptors, composition and transition, are extracted for representing each protein sequence. On the basis of such a feature representation, novel weighted sparse representation based classifier is introduced to predict protein interaction class. When the proposed method was evaluated with the PPIs data of S. cerevisiae, Human and H. pylori, it achieved high prediction accuracies of 96.82, 97.66 and 92.83 % respectively. Extensive experiments were performed for cross-species PPIs prediction and the prediction accuracies were also very promising.ConclusionsTo further evaluate the performance of the proposed method, we then compared its performance with the method based on support vector machine (SVM). The results show that the proposed method achieved a significant improvement. Thus, the proposed method is a very efficient method to predict PPIs and may be a useful supplementary tool for future proteomics studies.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1035-4) contains supplementary material, which is available to authorized users.
The extreme learning machine (ELM) has drawn insensitive research attentions due to its effectiveness in solving many machine learning problems. However, the matrix inversion operation involved in the algorithm is computational prohibitive and limits the wide applications of ELM in many scenarios. To overcome this problem, in this paper, we propose an inverse-free ELM to incrementally increase the number of hidden nodes, and update the connection weights progressively and optimally. Theoretical analysis proves the monotonic decrease of the training error with the proposed updating procedure and also proves the optimality in every updating step. Extensive numerical experiments show the effectiveness and accuracy of the proposed algorithm.
Automatic Web-service selection is an important research topic in the domain of service computing. During this process, reliable predictions for quality of service (QoS) based on historical service invocations are vital to users. This work aims at making highly accurate predictions for missing QoS data via building an ensemble of nonnegative latent factor (NLF) models. Its motivations are: 1) the fulfillment of nonnegativity constraints can better represent the positive value nature of QoS data, thereby boosting the prediction accuracy and 2) since QoS prediction is a learning task, it is promising to further improve the prediction accuracy with a carefully designed ensemble model. To achieve this, we first implement an NLF model for QoS prediction. This model is then diversified through feature sampling and randomness injection to form a diversified NLF model, based on which an ensemble is built. Comparison results between the proposed ensemble and several widely employed and state-of-the-art QoS predictors on two large, real data sets demonstrate that the former can outperform the latter well in terms of prediction accuracy.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.