Surveillance is critical to mounting an appropriate and effective response to pandemics. However, aggregated case report data suffers from reporting delays and can lead to misleading inferences. Different from aggregated case report data, line list data is a table contains individual features such as dates of symptom onset and reporting for each reported case and a good source for modeling delays. Current methods for modeling reporting delays are not particularly appropriate for line list data, which typically has missing symptom onset dates that are non-ignorable for modeling reporting delays. In this paper, we develop a Bayesian approach that dynamically integrates imputation and estimation for line list data. Specifically, this Bayesian approach can accurately estimate the epidemic curve and instantaneous reproduction numbers, even with most symptom onset dates missing. The Bayesian approach is also robust to deviations from model assumptions, such as changes in the reporting delay distribution or incorrect specification of the maximum reporting delay. We apply the Bayesian approach to COVID-19 line list data in Massachusetts and find the reproduction number estimates correspond more closely to the control measures than the estimates based on the reported curve.
Household contact studies are frequently used in tuberculosis transmission research, and models based on them often focus on transmission within the household. This contradicts recent research which suggests the transmission may be more likely to happen outside the household than within the household in high burden settings where these studies are frequently conducted. Consequently, most models would lead to biased estimates and misleading public health interventions. There is a strong need for developing models that allow concurrent estimation of household and extra-household transmission. In this study, we develop a random directed graph model for tuberculosis transmission, which permits users to concurrently build models for both household and extra-household transmission. Furthermore, our model can estimate the relative frequency of household transmission versus extra-household transmission and consistently produce unbiased estimates for risk factors, regardless of whether community controls are available. We illustrate our approach with a household contact study conducted in Vitoria, Brazil, and our results indicate that extra-household transmission can account for 63% to 98% of M. tuberculosis infections detected during such a study.
Non-ignorable technical variation is commonly observed across data from multiple experimental runs, platforms, or studies. These so-called batch effects can lead to difficulty in merging data from multiple sources, as they can severely bias the outcome of the analysis. Many groups have developed approaches for removing batch effects from data, usually by accommodating batch variables into the analysis (one-step correction) or by preprocessing the data prior to the formal or final analysis (two-step correction). One-step correction is often desirable due it its simplicity, but its flexibility is limited and it can be difficult to include batch variables uniformly when an analysis has multiple stages. Two-step correction allows for richer models of batch mean and variance. However, prior investigation has indicated that two-step correction can lead to incorrect statistical inference in downstream analysis. Generally speaking, two-step approaches introduce a correlation structure in the corrected data, which, if ignored, may lead to either exaggerated or diminished significance in downstream applications such as differential expression analysis. Here, we provide more intuitive and more formal evaluations of the impacts of two-step batch correction compared to existing literature. We demonstrate that the undesired impacts of two-step correction (exaggerated or diminished significance) depend on both the nature of the study design and the batch effects. We also provide strategies for overcoming these negative impacts in downstream analyses using the estimated correlation matrix of the corrected data. We compare the results of our proposed workflow with the results from other published one-step and two-step methods and show that our methods lead to more consistent false discovery controls and power of detection across a variety of batch effect scenarios. Software for our method is available through GitHub (https://github.com/jtleek/sva-devel) and will be available in future versions of the sva R package in the Bioconductor project (https://bioconductor.org/packages/release/bioc/html/sva.html). Batch effect; Two-step batch adjustment; ComBat; Sample correlation adjustment; Generalized least squares
The internal validity of observational study is often subject to debate. In this study, we define the counterfactuals as the unobserved sample and intend to quantify its relationship with the null hypothesis statistical testing (NHST). We propose the probability of a causal inference is robust for internal validity, i.e., the PIV, as a robustness index of causal inference. Formally, the PIV is the probability of rejecting the null hypothesis again based on both the observed sample and the counterfactuals, provided the same null hypothesis has already been rejected based on the observed sample. Under either frequentist or Bayesian framework, one can bound the PIV of an inference based on his bounded belief about the counterfactuals, which is often needed when the unconfoundedness assumption is dubious. The PIV is equivalent to statistical power when the NHST is thought to be based on both the observed sample and the counterfactuals. We summarize the process of evaluating internal validity with the PIV into an eight-step procedure and illustrate it with an empirical example (i.e., Hong and Raudenbush (2005)).
Surveillance is the key of controling the COVID-19 pandemic, and it typically suffers from reporting delays and thus can be misleading. Previous methods for adjusting reporting delays are not particularly appropriate for line list data, which usually have lots of missing values that are non-ignorable for modeling reporting delays. In this paper, we develop a Bayesian approach that dynamically integrates imputation and estimation for line list data. We show this Bayesian approach lead to accurate estimates of the epidemic curve and time-varying reproductive numbers and is robust to deviations from model assumptions. We apply the Bayesian approach to a COVID-19 line list data in Massachusetts and find the reproductive number estimates correspond more closely to the control measures than the ones based on the reported curve.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.