We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered. The time complexity of the algorithm derived from this framework is nearly linear due to recently developed numerical techniques. In the absence of labeled instances, this framework can be utilized as a spectral clustering method for directed graphs, which generalizes the spectral clustering approach for undirected graphs. We have applied our framework to real-world web classification problems and obtained encouraging results.
We consider spectral clustering and transductive inference for data with multiple views. A typical example is the web, which can be described by either the hyperlinks between web pages or the words occurring in web pages. When each view is represented as a graph, one may convexly combine the weight matrices or the discrete Laplacians for each graph, and then proceed with existing clustering or classification techniques. Such a solution might sound natural, but its underlying principle is not clear. Unlike this kind of methodology, we develop multiview spectral clustering via generalizing the normalized cut from a single view to multiple views. We further build multiview transductive inference on the basis of multiview spectral clustering. Our framework leads to a mixture of Markov chains defined on every graph. The experimental evaluation on real-world web classification demonstrates promising results that validate our method.
Generating alternative queries, also known as query suggestion, has long been proved useful to help a user explore and express his information need. In many scenarios, such suggestions can be generated from a large scale graph of queries and other accessory information, such as the clickthrough. However, how to generate suggestions while ensuring their semantic consistency with the original query remains a challenging problem.In this work, we propose a novel query suggestion algorithm based on ranking queries with the hitting time on a large scale bipartite graph. Without involvement of twisted heuristics or heavy tuning of parameters, this method clearly captures the semantic consistency between the suggested query and the original query. Empirical experiments on a large scale query log of a commercial search engine and a scientific literature collection show that hitting time is effective to generate semantically consistent query suggestions. The proposed algorithm and its variations can successfully boost long tail queries, accommodating personalized query suggestion, as well as finding related authors in research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.