Jonathan Q. Jiang scite author profile

IEEE/ACM Trans. Comput. Biol. and Bioinf.

McQuay

2012

Assigning biological functions to uncharacterized proteins is a fundamental problem in the postgenomic era. The increasing availability of large amounts of data on protein-protein interactions (PPIs) has led to the emergence of a considerable number of computational methods for determining protein function in the context of a network. These algorithms, however, treat each functional class in isolation and thereby often suffer from the difficulty of the scarcity of labeled data. In reality, different functional classes are naturally dependent on one another. We propose a new algorithm, Multi-label Correlated Semi-supervised Learning (MCSL), to incorporate the intrinsic correlations among functional classes into protein function prediction by leveraging the relationships provided by the PPI network and the functional class network. The guiding intuition is that the classification function should be sufficiently smooth on subgraphs where the respective topologies of these two networks are a good match. We encode this intuition as regularized learning with intraclass and interclass consistency, which can be understood as an extension of the graph-based learning with local and global consistency (LGC) method. Cross validation on the yeast proteome illustrates that MCSL consistently outperforms several state-of-the-art methods. Most notably, it effectively overcomes the problem associated with scarcity of label data. The supplementary files are freely available at http://sites.google.com/site/csaijiang/MCSL.

Learning Protein Functions from Bi-relational Graph of Proteins and Function Annotations

2011

Abstract. We propose here a multi-label semi-supervised learning algorithm, PfunBG, to predict protein functions, employing a bi-relational graph (BG) of proteins and function annotations. Different from most, if not all, existing methods that only consider the partially labeled proteinprotein interaction (PPI) network, the BG comprises three components, a PPI network, a function class graph induced from function annotations of such proteins, and a bipartite graph induced from function assignments. By referring to proteins and function classes equally as vertices, we exploit network propagation to measure how closely a specific function class is related to a protein of interest. The experiments on a yeast PPI network illustrate its effectiveness and efficiency.

Predicting multiplex subcellular localization of proteins using protein-protein interaction network: a comparative study

2012

BMC Bioinformatics

BackgroundProteins that interact in vivo tend to reside within the same or "adjacent" subcellular compartments. This observation provides opportunities to reveal protein subcellular localization in the context of the protein-protein interaction (PPI) network. However, so far, only a few efforts based on heuristic rules have been made in this regard.ResultsWe systematically and quantitatively validate the hypothesis that proteins physically interacting with each other probably share at least one common subcellular localization. With the result, for the first time, four graph-based semi-supervised learning algorithms, Majority, χ2-score, GenMultiCut and FunFlow originally proposed for protein function prediction, are introduced to assign "multiplex localization" to proteins. We analyze these approaches by performing a large-scale cross validation on a Saccharomyces cerevisiae proteome compiled from BioGRID and comparing their predictions for 22 protein subcellular localizations. Furthermore, we build an ensemble classifier to associate 529 unlabeled and 137 ambiguously-annotated proteins with subcellular localizations, most of which have been verified in the previous experimental studies.ConclusionsPhysical interaction of proteins has actually provided an essential clue for their co-localization. Compared to the local approaches, the global algorithms consistently achieve a superior performance.

Modularity functions maximization with nonnegative relaxation facilitates community detection in networks

Physica A: Statistical Mechanics and its Applications

McQuay

2012

We show here that the problem of maximizing a family of quantitative functions, encompassing both the modularity (Q-measure) and modularity density (D-measure), for community detection can be uniformly understood as a combinatoric optimization involving the trace of a matrix called modularity Laplacian. Instead of using traditional spectral relaxation, we apply additional nonnegative constraint into this graph clustering problem and design efficient algorithms to optimize the new objective. With the explicit nonnegative constraint, our solutions are very close to the ideal community indicator matrix and can directly assign nodes into communities. The near-orthogonal columns of the solution can be reformulated as the posterior probability of corresponding node belonging to each community. Therefore, the proposed method can be exploited to identify the fuzzy or overlapping communities and thus facilitates the understanding of the intrinsic structure of networks. Experimental results show that our new algorithm consistently, sometimes significantly, outperforms the traditional spectral relaxation approaches.