Summary. We introduce a new estimator, the simultaneous multiscale change point estimator SMUCE, for the change point problem in exponential family regression. An unknown step function is estimated by minimizing the number of change points over the acceptance region of a multiscale test at a level α. The probability of overestimating the true number of change points K is controlled by the asymptotic null distribution of the multiscale test statistic. Further, we derive exponential bounds for the probability of underestimating K . By balancing these quantities, α will be chosen such that the probability of correctly estimating K is maximized. All results are even non-asymptotic for the normal case. On the basis of these bounds, we construct (asymptotically) honest confidence sets for the unknown step function and its change points. At the same time, we obtain exponential bounds for estimating the change point locations which for example yield the minimax rate O.n 1 / up to a log-term. Finally, the simultaneous multiscale change point estimator achieves the optimal detection rate of vanishing signals as n ! 1, even for an unbounded number of change points. We illustrate how dynamic programming techniques can be employed for efficient computation of estimators and confidence regions. The performance of the multiscale approach proposed is illustrated by simulations and in two cutting edge applications from genetic engineering and photoemission spectroscopy.
Previously, the convergence analysis for linear statistical inverse problems has mainly focused on spectral cut-off and Tikhonov-type estimators. Spectral cut-off estimators achieve minimax rates for a broad range of smoothness classes and operators, but their practical usefulness is limited by the fact that they require a complete spectral decomposition of the operator. Tikhonov estimators are simpler to compute but still involve the inversion of an operator and achieve minimax rates only in restricted smoothness classes. In this paper we introduce a unifying technique to study the mean square error of a large class of regularization methods (spectral methods) including the aforementioned estimators as well as many iterative methods, such as ν-methods and the Landweber iteration. The latter estimators converge at the same rate as spectral cut-off but require only matrix-vector products. Our results are applied to various problems; in particular we obtain precise convergence rates for satellite gradiometry, L 2 -boosting, and errors in variable problems.
Residual dipolar couplings (RDCs) provide information about the dynamic average orientation of internuclear vectors and amplitudes of motion up to milliseconds. They complement relaxation methods, especially on a time-scale window that we have called supra-s c (s c \ supra-s c \ 50 ls). Here we present a robust approach called Self-Consistent RDC-based Model-free analysis (SCRM) that delivers RDC-based order parametersindependent of the details of the structure used for alignment tensor calculation-as well as the dynamic average orientation of the inter-nuclear vectors in the protein structure in a self-consistent manner. For ubiquitin, the SCRM analysis yields an average RDC-derived order parameter of the NH vectors S 2 rdc ¼ 0:72 AE 0:02 compared to S 2 LS = 0.778 ± 0.003 for the Lipari-Szabo order parameters, indicating that the inclusion of the supra-s c window increases the averaged amplitude of mobility observed in the sub-s c window by about 34%. For the b-strand spanned by residues Lys48 to Leu50, an alternating pattern of backbone NH RDC order parameter S 2 rdc ðNHÞ = (0.59, 0.72, 0.59) was extracted. The backbone of Lys48, whose side chain is known to be involved in the poly-ubiquitylation process that leads to protein degradation, is very mobile on the supra-s c time scale (S 2 rdc ðNHÞ = 0.59 ± 0.03), while it is inconspicuous (S 2 LS ðNHÞ = 0.82) on the sub-s c as well as on ls-ms relaxation dispersion time scales. The results of this work differ from previous RDC dynamics studies of ubiquitin in the sense that the results are essentially independent of structural noise providing a much more robust assessment of dynamic effects that underlie the RDC data.
We study the asymptotics for jump-penalized least squares regression aiming at approximating a regression function by piecewise constant functions. Besides conventional consistency and convergence rates of the estimates in L 2 ([0, 1)) our results cover other metrics like Skorokhod metric on the space of càdlàg functions and uniform metrics on C([0, 1]). We will show that these estimators are in an adaptive sense rate optimal over certain classes of "approximation spaces." Special cases are the class of functions of bounded variation (piecewise) Hölder continuous functions of order 0 < α ≤ 1 and the class of step functions with a finite but arbitrary number of jumps. In the latter setting, we will also deduce the rates known from change-point analysis for detecting the jumps. Finally, the issue of fully automatic selection of the smoothing parameter is addressed.
Summary. Uniform confidence bands for densities f via non-parametric kernel estimates were first constructed by Bickel and Rosenblatt. In this paper this is extended to confidence bands in the deconvolution problem g D f * ψ for an ordinary smooth error density ψ. Under certain regularity conditions, we obtain asymptotic uniform confidence bands based on the asymptotic distribution of the maximal deviation (L 1 -distance) between a deconvolution kernel estimator f and f. Further consistency of the simple non-parametric bootstrap is proved. For our theoretical developments the bias is simply corrected by choosing an undersmoothing bandwidth. For practical purposes we propose a new data-driven bandwidth selector that is based on heuristic arguments, which aims at minimizing the L 1 -distance betweenf and f . Although not constructed explicitly to undersmooth the estimator, a simulation study reveals that the bandwidth selector suggested performs well in finite samples, in terms of both area and coverage probability of the resulting confidence bands. Finally the methodology is applied to measurements of the metallicity of local F and G dwarf stars. Our results confirm the 'G dwarf problem', i.e. the lack of metal poor G dwarfs relative to predictions from 'closed box models' of stellar formation.
Summary. The Wasserstein distance is an attractive tool for data analysis but statistical inference is hindered by the lack of distributional limits. To overcome this obstacle, for probability measures supported on finitely many points, we derive the asymptotic distribution of empirical Wasserstein distances as the optimal value of a linear programme with random objective function.This facilitates statistical inference (e.g. confidence intervals for sample-based Wasserstein distances) in large generality. Our proof is based on directional Hadamard differentiability. Failure of the classical bootstrap and alternatives are discussed. The utility of the distributional results is illustrated on two data sets.
We introduce an approach based on the recently introduced functional mode analysis to identify collective modes of internal dynamics that maximally correlate to an external order parameter of functional interest. Input structural data can be either experimentally determined structure ensembles or simulated ensembles, such as molecular dynamics trajectories. Partial least-squares regression is shown to yield a robust solution to the multidimensional optimization problem, with a minimal and controllable risk of overfitting, as shown by extensive cross-validation. Several examples illustrate that the partial least-squares-based functional mode analysis successfully reveals the collective dynamics underlying the fluctuations in selected functional order parameters. Applications to T4 lysozyme, the Trp-cage, the aquaporin channels Aqy1 and hAQP1, and the CLC-ec1 chloride antiporter are presented in which the active site geometry, the hydrophobic solvent-accessible surface, channel gating dynamics, water permeability (p(f)), and a dihedral angle are defined as functional order parameters. The Aqy1 case reveals a gating mechanism that connects the inner channel gating residues with the protein surface, thereby providing an explanation of how the membrane may affect the channel. hAQP1 shows how the p(f) correlates with structural changes around the aromatic/arginine region of the pore. The CLC-ec1 application shows how local motions of the gating Glu(148) couple to a collective motion that affects ion affinity in the pore.
The exact mean-squared error (MSE) of estimators of the variance in nonparametric regression based on quadratic forms is investigated. In particular, two classes of estimators are compared: Hall, Kay and Titterington's optimal difference-based estimators and a class of ordinary difference-based estimators which generalize methods proposed by Rice and Gasser, Sroka and Jennen-Steinmetz. For small sample sizes the MSE of the ®rst estimator is essentially increased by the magnitude of the integrated ®rst two squared derivatives of the regression function. It is shown that in many situations ordinary difference-based estimators are more appropriate for estimating the variance, because they control the bias much better and hence have a much better overall performance. It is also demonstrated that Rice's estimator does not always behave well. Data-driven guidelines are given to select the estimator with the smallest MSE.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.