Summary Many contemporary large‐scale applications involve building interpretable models linking a large set of potential covariates to a response in a non‐linear fashion, such as when the response is binary. Although this modelling problem has been extensively studied, it remains unclear how to control the fraction of false discoveries effectively even in high dimensional logistic regression, not to mention general high dimensional non‐linear models. To address such a practical problem, we propose a new framework of ‘model‐X’ knockoffs, which reads from a different perspective the knockoff procedure that was originally designed for controlling the false discovery rate in linear models. Whereas the knockoffs procedure is constrained to homoscedastic linear models with n⩾p, the key innovation here is that model‐X knockoffs provide valid inference from finite samples in settings in which the conditional distribution of the response is arbitrary and completely unknown. Furthermore, this holds no matter the number of covariates. Correct inference in such a broad setting is achieved by constructing knockoff variables probabilistically instead of geometrically. To do this, our approach requires that the covariates are random (independent and identically distributed rows) with a distribution that is known, although we provide preliminary experimental evidence that our procedure is robust to unknown or estimated distributions. To our knowledge, no other procedure solves the controlled variable selection problem in such generality but, in the restricted settings where competitors exist, we demonstrate the superior power of knockoffs through simulations. Finally, we apply our procedure to data from a case–control study of Crohn's disease in the UK, making twice as many discoveries as the original analysis of the same data.
In this paper we present a novel probabilistic sampling-based motion planning algorithm called the Fast Marching Tree algorithm (FMT*). The algorithm is specifically aimed at solving complex motion planning problems in high-dimensional configuration spaces. This algorithm is proven to be asymptotically optimal and is shown to converge to an optimal solution faster than its state-of-the-art counterparts, chiefly PRM* and RRT*. The FMT* algorithm performs a “lazy” dynamic programming recursion on a predetermined number of probabilistically-drawn samples to grow a tree of paths, which moves steadily outward in cost-to-arrive space. As such, this algorithm combines features of both single-query algorithms (chiefly RRT) and multiple-query algorithms (chiefly PRM), and is reminiscent of the Fast Marching Method for the solution of Eikonal equations. As a departure from previous analysis approaches that are based on the notion of almost sure convergence, the FMT* algorithm is analyzed under the notion of convergence in probability: the extra mathematical flexibility of this approach allows for convergence rate bounds—the first in the field of optimal sampling-based motion planning. Specifically, for a certain selection of tuning parameters and configuration spaces, we obtain a convergence rate bound of order O(n−1/d+ρ), where n is the number of sampled points, d is the dimension of the configuration space, and ρ is an arbitrarily small constant. We go on to demonstrate asymptotic optimality for a number of variations on FMT*, namely when the configuration space is sampled non-uniformly, when the cost is not arc length, and when connections are made based on the number of nearest neighbors instead of a fixed connection radius. Numerical experiments over a range of dimensions and obstacle configurations confirm our the-oretical and heuristic arguments by showing that FMT*, for a given execution time, returns substantially better solutions than either PRM* or RRT*, especially in high-dimensional configuration spaces and in scenarios where collision-checking is expensive.
This article presents a novel approach, named MCMP (Monte Carlo Motion Planning), to the problem of motion planning under uncertainty, i.e., to the problem of computing a low-cost path that fulfills probabilistic collision avoidance constraints. MCMP estimates the collision probability (CP) of a given path by sampling via Monte Carlo the execution of a reference tracking controller (in this paper we consider LQG). The key algorithmic contribution of this paper is the design of statistical variance-reduction techniques, namely control variates and importance sampling, to make such a sampling procedure amenable to real-time implementation. MCMP applies this CP estimation procedure to motion planning by iteratively (i) computing an (approximately) optimal path for the deterministic version of the problem (here, using the FMT * algorithm), (ii) computing the CP of this path, and (iii) inflating or deflating the obstacles by a common factor depending on whether the CP is higher or lower than a target value. The advantages of MCMP are threefold: (i) asymptotic correctness of CP estimation, as opposed to most current approximations, which, as shown in this paper, can be off by large multiples and hinder the computation of feasible plans; (ii) speed and parallelizability, and (iii) generality, i.e., the approach is applicable to virtually any planning problem provided that a path tracking controller and a notion of distance to obstacles in the configuration space are available. Numerical results illustrate the correctness (in terms of feasibility), efficiency (in terms of path cost), and computational speed of MCMP.
Consider the following three important problems in statistical inference, namely, constructing confidence intervals for (1) the error of a high-dimensional (p > n) regression estimator, (2) the linear regression noise level, and (3) the genetic signal-to-noise ratio of a continuous-valued trait (related to the heritability). All three problems turn out to be closely related to the little-studied problem of performing inference on the ℓ2-norm of the signal in high-dimensional linear regression. We derive a novel procedure for this, which is asymptotically correct when the covariates are multivariate Gaussian and produces valid confidence intervals in finite samples as well. The procedure, called EigenPrism, is computationally fast and makes no assumptions on coefficient sparsity or knowledge of the noise level. We investigate the width of the EigenPrism confidence intervals, including a comparison with a Bayesian setting in which our interval is just 5% wider than the Bayes credible interval. We are then able to unify the three aforementioned problems by showing that the EigenPrism procedure with only minor modifications is able to make important contributions to all three. We also investigate the robustness of coverage and find that the method applies in practice and in finite samples much more widely than just the case of multivariate Gaussian covariates. Finally, we apply EigenPrism to a genetic dataset to estimate the genetic signal-to-noise ratio for a number of continuous phenotypes.
Background Examination of at least 16 lymph nodes (LNs) has been traditionally recommended during gastric adenocarcinoma (GAC) resection to optimize staging, but the impact of this strategy on survival is uncertain. As recent randomized trials have demonstrated a therapeutic benefit from extended lymphadenectomy, we sought to investigate the impact of the number of LNs removed on prognosis after GAC resection. Study Design Patients who underwent gastrectomy for GAC from 2000 to 2012 at seven US academic institutions were analyzed. Patients with M1 disease or R2 resections were excluded. Disease-specific survival (DSS) was calculated using the Kaplan-Meier method and compared using log-rank and Cox regression analyses. Results Of 742 patients, 257 (35%) had 7–15 LNs removed and 485 (65%) had ≥16 LNs removed. DSS was not significantly longer after removal of ≥16 versus 7–15 LNs (10-year, 55% versus 47%; P = 0.53) for the entire cohort, but was significantly improved in the subset of patients with stage IA-IIIA (10-year, 74% versus 57%; P = 0.018) or N0-2 disease (72% versus 55%, P = 0.023). Similarly, for patients who were classified to more likely be “true N0-2”, based on frequentist analysis incorporating both the number of positive and of total LNs removed, the hazard ratio for disease-related death (adjusted for T stage, R status, grade, receipt of neoadjuvant and adjuvant therapy, as well as institution) significantly decreased as the number of LNs removed increased. Conclusions The number of lymph nodes removed during gastrectomy for adenocarcinoma appears itself to have prognostic implications on long-term survival.
Motion planning under differential constraints is a classic problem in robotics. To date, the state of the art is represented by sampling-based techniques, with the Rapidly-exploring Random Tree algorithm as a leading example. Yet, the problem is still open in many aspects, including guarantees on the quality of the obtained solution. In this paper we provide a thorough theoretical framework to assess optimality guarantees of sampling-based algorithms for planning under differential constraints. We exploit this framework to design and analyze two novel sampling-based algorithms that are guaranteed to converge, as the number of samples increases, to an optimal solution (namely, the Differential Probabilistic RoadMap algorithm and the Differential Fast Marching Tree algorithm). Our focus is on driftless control-affine dynamical models, which accurately model a large class of robotic systems. In this paper we use the notion of convergence in probability (as opposed to convergence almost surely): the extra mathematical flexibility of this approach yields convergence rate bounds — a first in the field of optimal sampling-based motion planning under differential constraints. Numerical experiments corroborating our theoretical results are presented and discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.