In this paper, we develop algorithms for robust linear regression by leveraging the connection between the problems of robust regression and sparse signal recovery. We explicitly model the measurement noise as a combination of two terms; the first term accounts for regular measurement noise modeled as zero mean Gaussian noise, and the second term captures the impact of outliers. The fact that the latter outlier component could indeed be a sparse vector provides the opportunity to leverage sparse signal reconstruction methods to solve the problem of robust regression. Maximum a posteriori (MAP) based and empirical Bayesian inference based algorithms are developed for this purpose. Experimental studies on simulated and real data sets are presented to demonstrate the effectiveness of the proposed algorithms.
Abstract-We study the tradeoffs between the number of measurements, the signal sparsity level, and the measurement noise level for exact support recovery of sparse signals via random noisy measurements. By drawing analogy between exact support recovery and communication over the Gaussian multiple access channel, and exploiting mathematical tools developed for the latter problem, we derive sharp asymptotic sufficient and necessary conditions for exact support recovery. Specifically, when the number of nonzero entries is held fixed, the exact asymptotics on the number of measurements for support recovery is developed. When the number of nonzero entries increases in certain manners, we obtain sufficient conditions tighter than existing results. The proposed information theoretic framework for analyzing the performance of support recovery is further demonstrated to be capable of dealing with a variety of sparse signal recovery models.
Web search is seeing a paradigm shift from keyword based search to an entity-centric organization of web data. To support web search with this deeper level of understanding, a web-scale entity linking system must have 3 key properties: First, its feature extraction must be robust to the diversity of web documents and their varied writing styles and content structures. Second, it must maintain high-precision linking for "tail" (unpopular) entities that is robust to the existence of confounding entities outside of the knowledge base and entity profiles with minimal information. Finally, the system must represent large-scale knowledge bases with a scalable and powerful feature representation. We have built and deployed a web-scale unsupervised entity linking system for a commercial search engine that addresses these requirements by combining new developments in sparse signal recovery to identify the most discriminative features from noisy, free-text web documents; explicit modeling of out-of-knowledge-base entities to improve precision at the tail; and the development of a new phrase-unigram language model to efficiently capture high-order dependencies in lexical features. Using a knowledge base of 100M unique people from a popular social networking site, we present experimental results in the challenging domain of people-linking at the tail, where most entities have limited web presence. Our experimental results show that this system substantially improves on the precision-recall tradeoff over baseline methods, achieving precision over 95% with recall over 60%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.