The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros from data. We show that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs. Neighborhood selection estimates the conditional independence restrictions separately for each node in the graph and is hence equivalent to variable selection for Gaussian linear models. We show that the proposed neighborhood selection scheme is consistent for sparse high-dimensional graphs. Consistency hinges on the choice of the penalty parameter. The oracle value for optimal prediction does not lead to a consistent neighborhood estimate. Controlling instead the probability of falsely joining some distinct connectivity components of the graph, consistent estimation for sparse graphs is achieved (with exponential rates), even when the number of variables grows as the number of observations raised to an arbitrary power.Comment: Published at http://dx.doi.org/10.1214/009053606000000281 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
Summary. Estimation of structure, such as in variable selection, graphical modelling or cluster analysis, is notoriously difficult, especially for high dimensional data. We introduce stability selection. It is based on subsampling in combination with (high dimensional) selection algorithms. As such, the method is extremely general and has a very wide range of applicability. Stability selection provides finite sample control for some error rates of false discoveries and hence a transparent principle to choose a proper amount of regularization for structure estimation. Variable selection and structure estimation improve markedly for a range of selection methods if stability selection is applied. We prove for the randomized lasso that stability selection will be variable selection consistent even if the necessary conditions for consistency of the original lasso method are violated. We demonstrate stability selection for variable selection and Gaussian graphical modelling, using real and simulated data.
More than 100 countries have adopted a global warming limit of 2 degrees C or below (relative to pre-industrial levels) as a guiding principle for mitigation efforts to reduce climate change risks, impacts and damages. However, the greenhouse gas (GHG) emissions corresponding to a specified maximum warming are poorly known owing to uncertainties in the carbon cycle and the climate response. Here we provide a comprehensive probabilistic analysis aimed at quantifying GHG emission budgets for the 2000-50 period that would limit warming throughout the twenty-first century to below 2 degrees C, based on a combination of published distributions of climate system properties and observational constraints. We show that, for the chosen class of emission scenarios, both cumulative emissions up to 2050 and emission levels in 2050 are robust indicators of the probability that twenty-first century warming will not exceed 2 degrees C relative to pre-industrial temperatures. Limiting cumulative CO(2) emissions over 2000-50 to 1,000 Gt CO(2) yields a 25% probability of warming exceeding 2 degrees C-and a limit of 1,440 Gt CO(2) yields a 50% probability-given a representative estimate of the distribution of climate system properties. As known 2000-06 CO(2) emissions were approximately 234 Gt CO(2), less than half the proven economically recoverable oil, gas and coal reserves can still be emitted up to 2050 to achieve such a goal. Recent G8 Communiqués envisage halved global GHG emissions by 2050, for which we estimate a 12-45% probability of exceeding 2 degrees C-assuming 1990 as emission base year and a range of published climate sensitivity distributions. Emissions levels in 2020 are a less robust indicator, but for the scenarios considered, the probability of exceeding 2 degrees C rises to 53-87% if global GHG emissions are still more than 25% above 2000 levels in 2020.
Global efforts to mitigate climate change are guided by projections of future temperatures. But the eventual equilibrium global mean temperature associated with a given stabilization level of atmospheric greenhouse gas concentrations remains uncertain, complicating the setting of stabilization targets to avoid potentially dangerous levels of global warming. Similar problems apply to the carbon cycle: observations currently provide only a weak constraint on the response to future emissions. Here we use ensemble simulations of simple climate-carbon-cycle models constrained by observations and projections from more comprehensive models to simulate the temperature response to a broad range of carbon dioxide emission pathways. We find that the peak warming caused by a given cumulative carbon dioxide emission is better constrained than the warming response to a stabilization scenario. Furthermore, the relationship between cumulative emissions and peak warming is remarkably insensitive to the emission pathway (timing of emissions or peak emission rate). Hence policy targets based on limiting cumulative emissions of carbon dioxide are likely to be more robust to scientific uncertainty than emission-rate or concentration targets. Total anthropogenic emissions of one trillion tonnes of carbon (3.67 trillion tonnes of CO(2)), about half of which has already been emitted since industrialization began, results in a most likely peak carbon-dioxide-induced warming of 2 degrees C above pre-industrial temperatures, with a 5-95% confidence interval of 1.3-3.9 degrees C.
Summary.What is the difference between a prediction that is made with a causal model and that with a non-causal model? Suppose that we intervene on the predictor variables or change the whole environment. The predictions from a causal model will in general work as well under interventions as for observational data. In contrast, predictions from a non-causal model can potentially be very wrong if we actively intervene on variables. Here, we propose to exploit this invariance of a prediction under a causal model for causal inference: given different experimental settings (e.g. various interventions) we collect all models that do show invariance in their predictive accuracy across settings and interventions. The causal model will be a member of this set of models with high probability. This approach yields valid confidence intervals for the causal relationships in quite general scenarios. We examine the example of structural equation models in more detail and provide sufficient assumptions under which the set of causal predictors becomes identifiable. We further investigate robustness properties of our approach under model misspecification and discuss possible extensions. The empirical properties are studied for various data sets, including large-scale gene perturbation experiments.
The Lasso is an attractive technique for regularization and variable selection for high-dimensional data, where the number of predictor variables $p_n$ is potentially much larger than the number of samples $n$. However, it was recently discovered that the sparsity pattern of the Lasso estimator can only be asymptotically identical to the true sparsity pattern if the design matrix satisfies the so-called irrepresentable condition. The latter condition can easily be violated in the presence of highly correlated variables. Here we examine the behavior of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the $\ell_2$-norm sense for fixed designs under conditions on (a) the number $s_n$ of nonzero components of the vector $\beta_n$ and (b) the minimal singular values of design matrices that are induced by selecting small subsets of variables. Furthermore, a rate of convergence result is obtained on the $\ell_2$ error with an appropriate choice of the smoothing parameter. The rate is shown to be optimal under the condition of bounded maximal and minimal sparse eigenvalues. Our results imply that, with high probability, all important variables are selected. The set of selected variables is a meaningful reduction on the original set of variables. Finally, our results are illustrated with the detection of closely adjacent frequencies, a problem encountered in astrophysics.Comment: Published in at http://dx.doi.org/10.1214/07-AOS582 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
Assigning significance in high-dimensional regression is challenging. Most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid p-values are not available. An exception is a recent proposal by Wasserman and Roeder (2008) which splits the data into two parts. The number of variables is then reduced to a manageable size using the first split, while classical variable selection techniques can be applied to the remaining variables, using the data from the second split. This yields asymptotic error control under minimal conditions. It involves, however, a one-time random split of the data. Results are sensitive to this arbitrary choice: it amounts to a 'p-value lottery' and makes it difficult to reproduce results. Here, we show that inference across multiple random splits can be aggregated, while keeping asymptotic control over the inclusion of noise variables. We show that the resulting p-values can be used for control of both family-wise error (FWER) and false discovery rate (FDR). In addition, the proposed aggregation is shown to improve power while reducing the number of falsely selected variables substantially.
Abstract. Anthropogenic increases in atmospheric greenhouse gas concentrations are the main driver of current and future climate change. The integrated assessment community has quantified anthropogenic emissions for the shared socio-economic pathway (SSP) scenarios, each of which represents a different future socio-economic projection and political environment. Here, we provide the greenhouse gas concentrations for these SSP scenarios – using the reduced-complexity climate–carbon-cycle model MAGICC7.0. We extend historical, observationally based concentration data with SSP concentration projections from 2015 to 2500 for 43 greenhouse gases with monthly and latitudinal resolution. CO2 concentrations by 2100 range from 393 to 1135 ppm for the lowest (SSP1-1.9) and highest (SSP5-8.5) emission scenarios, respectively. We also provide the concentration extensions beyond 2100 based on assumptions regarding the trajectories of fossil fuels and land use change emissions, net negative emissions, and the fraction of non-CO2 emissions. By 2150, CO2 concentrations in the lowest emission scenario are approximately 350 ppm and approximately plateau at that level until 2500, whereas the highest fossil-fuel-driven scenario projects CO2 concentrations of 1737 ppm and reaches concentrations beyond 2000 ppm by 2250. We estimate that the share of CO2 in the total radiative forcing contribution of all considered 43 long-lived greenhouse gases increases from 66 % for the present day to roughly 68 % to 85 % by the time of maximum forcing in the 21st century. For this estimation, we updated simple radiative forcing parameterizations that reflect the Oslo Line-By-Line model results. In comparison to the representative concentration pathways (RCPs), the five main SSPs (SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5) are more evenly spaced and extend to lower 2100 radiative forcing and temperatures. Performing two pairs of six-member historical ensembles with CESM1.2.2, we estimate the effect on surface air temperatures of applying latitudinally and seasonally resolved GHG concentrations. We find that the ensemble differences in the March–April–May (MAM) season provide a regional warming in higher northern latitudes of up to 0.4 K over the historical period, latitudinally averaged of about 0.1 K, which we estimate to be comparable to the upper bound (∼5 % level) of natural variability. In comparison to the comparatively straight line of the last 2000 years, the greenhouse gas concentrations since the onset of the industrial period and this studies' projections over the next 100 to 500 years unequivocally depict a “hockey-stick” upwards shape. The SSP concentration time series derived in this study provide a harmonized set of input assumptions for long-term climate science analysis; they also provide an indication of the wide set of futures that societal developments and policy implementations can lead to – ranging from multiple degrees of future warming on the one side to approximately 1.5 ∘C warming on the other.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.