On par with data-intensive applications, the sheer size of modern linear regression problems creates an ever-growing demand for efficient solvers. Fortunately, a significant percentage of the data accrued can be omitted while maintaining a certain quality of statistical inference with an affordable computational budget. This work introduces means of identifying and omitting less informative observations in an online and data-adaptive fashion. Given streaming data, the related maximum-likelihood estimator is sequentially found using first- and second-order stochastic approximation algorithms. These schemes are well suited when data are inherently censored or when the aim is to save communication overhead in decentralized learning setups. In a different operational scenario, the task of joint censoring and estimation is put forth to solve large-scale linear regressions in a centralized setup. Novel online algorithms are developed enjoying simple closed-form updates and provable (non)asymptotic convergence guarantees. To attain desired censoring patterns and levels of dimensionality reduction, thresholding rules are investigated too. Numerical tests on real and synthetic datasets corroborate the efficacy of the proposed data-adaptive methods compared to data-agnostic random projection-based alternatives.
Diffusion-based classifiers such as those relying on the Personalized PageRank and the Heat kernel, enjoy remarkable classification accuracy at modest computational requirements. Their performance however is affected by the extent to which the chosen diffusion captures a typically unknown label propagation mechanism, that can be specific to the underlying graph, and potentially different for each class. The present work introduces a disciplined, data-efficient approach to learning classspecific diffusion functions adapted to the underlying network topology. The novel learning approach leverages the notion of "landing probabilities" of class-specific random walks, which can be computed efficiently, thereby ensuring scalability to large graphs. This is supported by rigorous analysis of the properties of the model as well as the proposed algorithms. Furthermore, a robust version of the classifier facilitates learning even in noisy environments. Classification tests on real networks demonstrate that adapting the diffusion function to the given graph and observed labels, significantly improves the performance over fixed diffusions; reaching -and many times surpassing -the classification accuracy of computationally heavier state-of-theart competing methods, that rely on node embeddings and deep neural networks.
Radio tomographic imaging (RTI) is an emerging technology to locate physical objects in a geographical area covered by wireless networks. From the attenuation measurements collected at spatially distributed sensors, radio tomography capitalizes on spatial loss fields (SLFs) measuring the absorption of radio frequency waves at each location along the propagation path. These SLFs can be utilized for interference management in wireless communication networks, environmental monitoring, and survivor localization after natural disaster such as earthquakes. Key to success of RTI is to model accurately the shadowing effects as the bi-dimensional integral of the SLF scaled by a weight function, which is estimated using regularized regression. However, the existing approaches are less effective when the propagation environment is heterogeneous. To cope with this, the present work introduces a piecewise homogeneous SLF governed by a hidden Markov random field (MRF) model. Efficient and tractable SLF estimators are developed by leveraging Markov chain Monte Carlo (MCMC) techniques. Furthermore, an uncertainty sampling method is developed to adaptively collect informative measurements in estimating the SLF. Numerical tests using synthetic and real datasets demonstrate capabilities of the proposed algorithm for radio tomography and channel-gain estimation.
This paper introduces PerDif; a novel framework for learning personalized difusions over item-to-item graphs for top-n recommendation. PerDif learns the teleportation probabilities of a timeinhomogeneous random walk with restarts capturing a user-specifc underlying item exploration process. Such an approach can lead to signifcant improvements in recommendation accuracy, while also providing useful information about the users in the system. Per-user ftting can be performed in parallel and very efciently even in large-scale settings. A comprehensive set of experiments on real-world datasets demonstrate the scalability as well as the qualitative merits of the proposed framework. PerDif achieves high recommendation accuracy, outperforming state-of-the-art competing approaches-including several recently proposed methods relying on deep neural networks. CCS CONCEPTS• Information systems → Recommender systems.
The deluge of networked data motivates the development of algorithms for computation-and communicationefficient information processing. In this context, three dataadaptive censoring strategies are introduced to considerably reduce the computation and communication overhead of decentralized recursive least-squares (D-RLS) solvers. The first relies on alternating minimization and the stochastic Newton iteration to minimize a network-wide cost, which discards observations with small innovations. In the resultant algorithm, each node performs local data-adaptive censoring to reduce computations, while exchanging its local estimate with neighbors so as to consent on a network-wide solution. The communication cost is further reduced by the second strategy, which prevents a node from transmitting its local estimate to neighbors when the innovation it induces to incoming data is minimal. In the third strategy, not only transmitting, but also receiving estimates from neighbors is prohibited when data-adaptive censoring is in effect. For all strategies, a simple criterion is provided for selecting the threshold of innovation to reach a prescribed average data reduction. The novel censoring-based (C)D-RLS algorithms are proved convergent to the optimal argument in the mean-root deviation sense. Numerical experiments validate the effectiveness of the proposed algorithms in reducing computation and communication overhead.Index Terms-Decentralized estimation, networks, recursive least-squares (RLS), data-adaptive censoring 2 Other than the star topology studied in the aforementioned works, [20] investigates censoring for a tree structure. If a node's local likelihood ratio exceeds a threshold, its local data is sent to its parent node for fusion. A fully decentralized setting is considered in [3], where each node determines whether to transmit its local estimate to its neighbors by comparing the local estimate with the weighted average of its neighbors. Nevertheless, [3] aims at mitigating only the communication cost, while the present work also considers reduction of the computational cost across the network. Furthermore, the censoring-based decentralized linear regression algorithm in [14] deals with optimal full-complexity estimation when observations are partially known or corrupted. This is different from our context, where censoring is deliberately introduced to reduce computational and communication costs for decentralized linear regression. B. Our contributions and organizationThe present paper introduces three data-adaptive online censoring strategies for decentralized linear regression. The resultant CD-RLS algorithms incur low computational and communication costs, and are thus attractive for large-scale network applications requiring decentralized solvers of linear regressions. Unlike most related works that specifically target wireless sensor networks (WSNs), the proposed algorithms may be used in a broader context of decentralized linear regression using multiple computing platforms. Of particular interest are ca...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.