A growing trend in engineering and science is to use multiple computer codes with different levels of accuracy to study the same complex system. We propose a framework for sequential design and analysis of a pair of high-accuracy and low-accuracy computer codes. It first runs the two codes with a pair of nested Latin hypercube designs (NLHDs). Data from the initial experiment are used to fit a prediction model. If the accuracy of the fitted model is less than a prespecified threshold, the two codes are evaluated again with input values chosen in an elaborate fashion so that their expanded scenario sets still form a pair of NLHDs. The nested relationship between the two scenario sets makes it easier to model and calibrate the difference between the two sources. If necessary, this augmentation process can be repeated a number of times until the prediction model based on all available data has reasonable accuracy. The effectiveness of the proposed method is illustrated with several examples. Matlab codes are provided in the online supplement to this article.
To find efficient screening methods for high dimensional linear regression models, this paper studies the relationship between model fitting and screening performance.Under a sparsity assumption, we show that a subset that includes the true submodel always yields smaller residual sum of squares (i.e., has better model fitting) than all that do not in a general asymptotic setting. This indicates that, for screening important variables, we could follow a "better fitting, better screening" rule, i.e., pick a "better" subset that has better model fitting. To seek such a better subset, we consider the optimization problem associated with best subset regression. An EM algorithm, called orthogonalizing subset screening, and its accelerating version are proposed for searching for the best subset. Although the two algorithms cannot guarantee that a subset they yield is the best, their monotonicity property makes the subset have better model fitting than initial subsets generated by popular screening methods, and thus the subset can have better screening performance asymptotically. Simulation results show that our methods are very competitive in high dimensional variable screening even for finite sample sizes.
We introduce an efficient iterative algorithm, intended for various least squares problems, based on a design of experiments perspective. The algorithm, called orthogonalizing EM (OEM), works for ordinary least squares and can be easily extended to penalized least squares. The main idea of the procedure is to orthogonalize a design matrix by adding new rows and then solve the original problem by embedding the augmented design in a missing data framework. We establish several attractive theoretical properties concerning OEM. For the ordinary least squares with a singular regression matrix, an OEM sequence converges to the Moore-Penrose generalized inverse-based least squares estimator. For ordinary and penalized least squares with various penalties, it converges to a point having grouping coherence for fully aliased regression matrices. Convergence and the convergence rate of the algorithm are examined. Finally, we demonstrate that OEM is highly efficient for large-scale least squares and penalized least squares problems, and is considerably faster than competing methods when n is much larger than p. Supplementary materials for this article are available online.
With the rapid development of metro systems, it has become increasingly important to study phenomena such as passenger flow distribution and passenger boarding behavior. It is difficult for existing methods to accurately describe actual situations and to extend to the whole metro system due to the limitations from parameter uncertainties in their mathematical models. In this article, we propose a passenger‐to‐train assignment model to evaluate the probabilities of individual passengers boarding each feasible train for both no‐transfer and one‐transfer situations. This model can be used to understand passenger flows and crowdedness. The input parameters of the model include the probabilities that the passengers take each train and the probability distribution of egress time, which is the time to walk to the tap‐out fare gate after alighting from the train. We present the likelihood method to estimate these parameters based on data from the automatic fare collection and automatic vehicle location systems. This method can construct several nonparametric density estimates without assuming the parametric form of the distribution of egress time. The EM algorithm is used to compute the maximum likelihood estimates. Simulation results indicate that the proposed estimates perform well. By applying our method to real data in Beijing metro system, we can identify different passenger flow patterns between peak and off‐peak hours.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.