Performing signal processing over graphs requires knowledge of the underlying fixed topology. However, graphs often grow in size with new nodes appearing over time, whose connectivity is typically unknown; hence, making more challenging the downstream tasks in applications like cold start recommendation. We address such a challenge for signal interpolation at the incoming nodes blind to the topological connectivity of the specific node. Specifically, we propose a stochastic attachment model for incoming nodes parameterized by the attachment probabilities and edge weights. We estimate these parameters in a data-driven fashion by relying only on the attachment behaviour of earlier incoming nodes with the goal of interpolating the signal value. We study the non-convexity of the problem at hand, derive conditions when it can be marginally convexified, and propose an alternating projected descent approach between estimating the attachment probabilities and the edge weights. Numerical experiments with synthetic and real data dealing in cold start collaborative filtering corroborate our findings.
Our capacity to learn representations from data is related to our ability to design filters that can leverage their coupling with the underlying domain. Graph filters are one such tool for network data and have been used in a myriad of applications. But graph filters work only with a fixed number of nodes despite the expanding nature of practical networks. Learning filters in this setting is challenging not only because of the increased dimensions but also because the connectivity is known only up to an attachment model. We propose a filter learning scheme for data over expanding graphs by relying only on such a model. By characterizing the filter stochastically, we develop an empirical risk minimization framework inspired by multi-kernel learning to balance the information inflow and outflow at the incoming nodes. We particularize the approach for denoising and semi-supervised learning (SSL) over expanding graphs and show near-optimal performance compared with baselines relying on the exact topology. For SSL, the proposed scheme uses the incoming node information to improve the task on the existing ones. These findings lay the foundation for learning representations over expanding graphs by relying only on the stochastic connectivity model.
Data processing over graphs is usually done on graphs of fixed size. However, graphs often grow with new nodes arriving over time. Knowing the connectivity information of these nodes, and thus, the expanded graph is crucial for processing data over the expanded graph. In its absence, its inference and the subsequent data processing become essential. This paper provides contributions along this direction by considering task-driven data processing for incoming nodes without connectivity information. We model the incoming node attachment as a random process dictated by the parameterized vectors of probabilities and weights of attachment. The attachment is driven by the existing graph topology, the corresponding graph signal, and an associated processing task. We consider two such tasks, one of interpolation at the incoming node, and that of graph signal smoothness. We show that the model bounds implicitly the spectral perturbation between the nominal topology of the expanded graph and the drawn realizations. In the absence of connectivity information our topology, task, and data-aware stochastic attachment performs better than purely data-driven and topology driven stochastic attachment rules, as is confirmed by numerical results over synthetic and real data.
In statistical learning over large data-sets, labeling all points is expensive and time-consuming. Semi-supervised classification allows learning with very few labels. Naturally, selecting a few points to label becomes crucial as the performance relies heavily on the labeled points. The motivation behind active learning is to build an optimal training set keeping the classifier in mind. Random or heuristic-driven selection does not care for the classification process or are trivially defined. We are interested in the graph structure formed by the data, as seen in citation, social and biological networks. Accordingly, active semi-supervised learning on graphs labels nodes to enhance the performance of classification. We propose a new methodology to perform active learning for diffusion-based semi-supervised classifiers. In particular, we focus on a classifier which diffuses probability distribu
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.