While adversarial training can improve robust accuracy (against an adversary), it sometimes hurts standard accuracy (when there is no adversary). Previous work has studied this tradeoff between standard and robust accuracy, but only in the setting where no predictor performs well on both objectives in the infinite data limit. In this paper, we show that even when the optimal predictor with infinite data performs well on both objectives, a tradeoff can still manifest itself with finite data. Furthermore, since our construction is based on a convex learning problem, we rule out optimization concerns, thus laying bare a fundamental tension between robustness and generalization. Finally, we show that robust self-training mostly eliminates this tradeoff by leveraging unlabeled data.
The Hidden Markov Model (HMM) is one of the mainstays of statistical modeling of discrete time series, with applications including speech recognition, computational biology, computer vision and econometrics. Estimating an HMM from its observation process is often addressed via the Baum-Welch algorithm, which is known to be susceptible to local optima. In this paper, we first give a general characterization of the basin of attraction associated with any global optimum of the population likelihood. By exploiting this characterization, we provide non-asymptotic finite sample guarantees on the Baum-Welch updates, guaranteeing geometric convergence to a small ball of radius on the order of the minimax rate around a global optimum. As a concrete example, we prove a linear rate of convergence for a hidden Markov mixture of two isotropic Gaussians given a suitable mean separation and an initialization within a ball of large radius around (one of) the true parameters. To our knowledge, these are the first rigorous local convergence guarantees to global optima for the Baum-Welch algorithm in a setting where the likelihood function is nonconvex. We complement our theoretical results with thorough numerical simulations studying the convergence of the Baum-Welch algorithm and illustrating the accuracy of our predictions.
Early stopping of iterative algorithms is a widely-used form of regularization in statistics, commonly used in conjunction with boosting and related gradient-type algorithms. Although consistency results have been established in some settings, such estimators are less well-understood than their analogues based on penalized regularization. In this paper, for a relatively broad class of loss functions and boosting algorithms (including L 2 -boost, LogitBoost and AdaBoost, among others), we exhibit a direct connection between the performance of a stopped iterate and the localized Gaussian complexity of the associated function class. This connection allows us to show that local fixed point analysis of Gaussian or Rademacher complexities, now standard in the analysis of penalized estimators, can be used to derive optimal stopping rules. We derive such stopping rules in detail for various kernel classes, and illustrate the correspondence of our theory with practice for Sobolev kernel classes. * Yuting Wei and Fanny Yang contributed equally to this work. † P Y |x i which depends on x i . Later in the paper, we also discuss the consequences of our results for the case of random design, where the (X i , Y i ) pairs are drawn in an i.i.d. fashion from the joint distribution P = P X P Y |X for some distribution P X on the covariates.In this section, we provide some necessary background on a gradient-type algorithm which is often referred to as boosting algorithm. We also discuss briefly about the reproducing kernel Hilbert spaces before turning to a precise formulation of the problem that is studied in this paper. Boosting and early stoppingConsider a cost function φ : R × R → [0, ∞), where the non-negative scalar φ(y, θ) denotes the cost associated with predicting θ when the true response is y. Some common examples of loss functions φ that we consider in later sections include:• the least-squares loss φ(y, θ) : = 1 2 (y − θ) 2 that underlies L 2 -boosting [9],• the logistic regression loss φ(y, θ) = ln(1+e −yθ ) that underlies the LogitBoost algorithm [15,16], and• the exponential loss φ(y, θ) = exp(−yθ) that underlies the AdaBoost algorithm [14].
This paper considers the recovery of continuous signals in infinite dimensional spaces from the magnitude of their frequency samples. It proposes a sampling scheme which involves a combination of oversampling and modulations with complex exponentials. Sufficient conditions are given such that almost every signal with compact support can be reconstructed up to a unimodular constant using only its magnitude samples in the frequency domain. Finally it is shown that an average sampling rate of four times the Nyquist rate is enough to reconstruct almost every time-limited signal.
This paper considers the problem of signal recovery from magnitude measurements for signals in modulation invariant spaces. It proposes a measurement setup such that almost every signal in such a signal space can be reconstructed from its amplitude measurements up to a global constant phase and with a sampling rate of four times the rate of innovation of the signal space. The applicability of the proposed scheme under noise measurements is demonstrated by computer simulations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.