Jerzy Wieczorek scite author profile

2020

National statistical agencies lack statistical methodology to express uncertainty in their released estimated overall rankings. For example, the US Census Bureau produced an 'explicit' ranking of the states based on observed sample estimates during 2011 of mean travel time to work. Current literature provides measures of uncertainty in estimated individual ranks, but not a direct measure of uncertainty for the estimated overall ranking. We construct and visualize a joint confidence region for the true unknown overall ranking that provides a measure of uncertainty in the estimated overall ranking.

Techniques for Validating an Automatic Bottleneck Detection Tool Using Archived Freeway Sensor Data

Transportation Research Record

Fernandez-Moctezuma

Bertini

2010

describe travel reliability across days, weeks, months, and years. With the availability of this rich source of archived data, new analytical tools can be developed for use with historical data and in a real-time environment informed by past performance. The objective of this paper is to describe the techniques used for rigorous validation and refinement of an automated system to identify freeway bottlenecks within the PORTAL environment. Using ground truth knowledge of when and where bottlenecks occurred during a substantial sample period, previous work (2) developed and tested a working prototype with the intent to accurately identify, track, and display active bottleneck features using graphical tools. This paper presents an extended analysis of that prototype, and contains new results and applications:Bottlenecks are key features of freeway systems. Their effects in performance and emissions are of increasing importance as congestion worsens in urban areas. In the United States, FHWA has been working to identify and monitor key bottlenecks in each state. In Oregon, a freeway data archive known as the Portland Oregon Regional Transportation Archive Listing archives measured count, density, and speed data from more than 600 locations at 20-s intervals. This archive has enabled development of online freeway performance and reliability analysis tools. This paper describes the rigorous evaluation and refinement of an automated tool for identifying recurrent freeway bottlenecks using historical data within the framework of the data archive. Efforts have focused on identification and display of active bottleneck features by using graphical tools and the selection of optimal variables that enabled careful identification of active bottlenecks. This research aims to detect bottleneck activation historically and, through future work, in real time as well. Ultimately, the results of this research will enhance the prioritization of improvements and implementation of operational strategies on the freeway network.

K‐fold cross‐validation for complex sample surveys

Guerin

McMahon

2022

Stat

Although K‐fold cross‐validation (CV) is widely used for model evaluation and selection, there has been limited understanding of how to perform CV for non‐iid data, including those from sampling designs with unequal selection probabilities. We introduce CV methodology that is appropriate for design‐based inference from complex survey sampling designs. For such data, we claim that we will tend to make better inferences when we choose the folds and compute the test errors in ways that account for the survey design features such as stratification and clustering. Our mathematical arguments are supported with simulations, and our methods are illustrated on real survey data.

A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals

Wright

Kampel

The American Statistician

2018

Model selection properties of forward selection and sequential cross‐validation for high‐dimensional regression

Lei

2021

Can J Statistics

Forward selection (FS) is a popular variable selection method for linear regression. But theoretical understanding of FS with a diverging number of covariates is still limited. We derive sufficient conditions for FS to attain model selection consistency. Our conditions are similar to those for orthogonal matching pursuit, but are obtained using a different argument. When the true model size is unknown, we derive sufficient conditions for model selection consistency of FS with a data-driven stopping rule, based on a sequential variant of cross-validation. As a byproduct of our proofs, we also have a sharp (sufficient and almost necessary) condition for model selection consistency of "wrapper" forward search for linear regression. We illustrate intuition and demonstrate performance of our methods using simulation studies and real datasets.