Persons living with HIV engage in routine clinical care, generating large amounts of data in observational HIV cohorts. These data are often error‐prone, and directly using them in biomedical research could bias estimation and give misleading results. A cost‐effective solution is the two‐phase design, under which the error‐prone variables are observed for all patients during Phase I, and that information is used to select patients for data auditing during Phase II. For example, the Caribbean, Central, and South America network for HIV epidemiology (CCASAnet) selected a random sample from each site for data auditing. Herein, we consider efficient odds ratio estimation with partially audited, error‐prone data. We propose a semiparametric approach that uses all information from both phases and accommodates a number of error mechanisms. We allow both the outcome and covariates to be error‐prone and these errors to be correlated, and selection of the Phase II sample can depend on Phase I data in an arbitrary manner. We devise a computationally efficient, numerically stable EM algorithm to obtain estimators that are consistent, asymptotically normal, and asymptotically efficient. We demonstrate the advantages of the proposed methods over existing ones through extensive simulations. Finally, we provide applications to the CCASAnet cohort.
Objective To develop and validate algorithms for predicting 30-day fatal and nonfatal opioid-related overdose using statewide data sources including prescription drug monitoring program data, Hospital Discharge Data System data, and Tennessee (TN) vital records. Current overdose prevention efforts in TN rely on descriptive and retrospective analyses without prognostication. Materials and Methods Study data included 3 041 668 TN patients with 71 479 191 controlled substance prescriptions from 2012 to 2017. Statewide data and socioeconomic indicators were used to train, ensemble, and calibrate 10 nonparametric “weak learner” models. Validation was performed using area under the receiver operating curve (AUROC), area under the precision recall curve, risk concentration, and Spiegelhalter z-test statistic. Results Within 30 days, 2574 fatal overdoses occurred after 4912 prescriptions (0.0069%) and 8455 nonfatal overdoses occurred after 19 460 prescriptions (0.027%). Discrimination and calibration improved after ensembling (AUROC: 0.79–0.83; Spiegelhalter P value: 0–.12). Risk concentration captured 47–52% of cases in the top quantiles of predicted probabilities. Discussion Partitioning and ensembling enabled all study data to be used given computational limits and helped mediate case imbalance. Predicting risk at the prescription level can aggregate risk to the patient, provider, pharmacy, county, and regional levels. Implementing these models into Tennessee Department of Health systems might enable more granular risk quantification. Prospective validation with more recent data is needed. Conclusion Predicting opioid-related overdose risk at statewide scales remains difficult and models like these, which required a partnership between an academic institution and state health agency to develop, may complement traditional epidemiological methods of risk identification and inform public health decisions.
Measurement errors are present in many data collection procedures and can harm analyses by biasing estimates. To correct for measurement error, researchers often validate a subsample of records and then incorporate the information learned from this validation sample into estimation. In practice, the validation sample is often selected using simple random sampling (SRS). However, SRS leads to inefficient estimates because it ignores information on the error-prone variables, which can be highly correlated to the unknown truth. Applying and extending ideas from the two-phase sampling literature, we propose optimal and nearly optimal designs for selecting the validation sample in the classical measurement-error framework. We target designs to improve the efficiency of model-based and design-based estimators, and show how the resulting designs compare to each other. Our results suggest that sampling schemes that extract more information from the error-prone data are substantially more efficient than SRS, for both design-and model-based estimators. The optimal procedure, however, depends on the analysis method, and can differ substantially. This is supported by theory and simulations. We illustrate the various designs using data from an HIV cohort study.
Introduction:Audits play a critical role in maintaining the integrity of observational cohort data. While previous work has validated the audit process, sending trained auditors to sites (“travel-audits”) can be costly. We investigate the efficacy of training sites to conduct “self-audits.”Methods:In 2017, eight research groups in the Caribbean, Central, and South America network for HIV Epidemiology each audited a subset of their patient records randomly selected by the data coordinating center at Vanderbilt. Designated investigators at each site compared abstracted research data to the original clinical source documents and captured audit findings electronically. Additionally, two Vanderbilt investigators performed on-site travel-audits at three randomly selected sites (one adult and two pediatric) in late summer 2017.Results:Self- and travel-auditors, respectively, reported that 93% and 92% of 8919 data entries, captured across 28 unique clinical variables on 65 patients, were entered correctly. Across all entries, 8409 (94%) received the same assessment from self- and travel-auditors (7988 correct and 421 incorrect). Of 421 entries mutually assessed as “incorrect,” 304 (82%) were corrected by both self- and travel-auditors and 250 of these (72%) received the same corrections. Reason for changing antiretroviral therapy (ART) regimen, ART end date, viral load value, CD4%, and HIV diagnosis date had the most mismatched corrections.Conclusions:With similar overall error rates, findings suggest that data audits conducted by trained local investigators could provide an alternative to on-site audits by external auditors to ensure continued data quality. However, discrepancies observed between corrections illustrate challenges in determining correct values even with audits.
In modern observational studies using electronic health records or other routinely collected data, both the outcome and covariates of interest can be error‐prone and their errors often correlated. A cost‐effective solution is the two‐phase design, under which the error‐prone outcome and covariates are observed for all subjects during the first phase and that information is used to select a validation subsample for accurate measurements of these variables in the second phase. Previous research on two‐phase measurement error problems largely focused on scenarios where there are errors in covariates only or the validation sample is a simple random sample of study subjects. Herein, we propose a semiparametric approach to general two‐phase measurement error problems with a quantitative outcome, allowing for correlated errors in the outcome and covariates and arbitrary second‐phase selection. We devise a computationally efficient and numerically stable expectation‐maximization algorithm to maximize the nonparametric likelihood function. The resulting estimators possess desired statistical properties. We demonstrate the superiority of the proposed methods over existing approaches through extensive simulation studies, and we illustrate their use in an observational HIV study.
Observational databases provide unprecedented opportunities for secondary use in biomedical research. However, these data can be error‐prone and must be validated before use. It is usually unrealistic to validate the whole database because of resource constraints. A cost‐effective alternative is a two‐phase design that validates a subset of records enriched for information about a particular research question. We consider odds ratio estimation under differential outcome and exposure misclassification and propose optimal designs that minimize the variance of the maximum likelihood estimator. Our adaptive grid search algorithm can locate the optimal design in a computationally feasible manner. Because the optimal design relies on unknown parameters, we introduce a multiwave strategy to approximate the optimal design. We demonstrate the proposed design's efficiency gains through simulations and two large observational studies.
Background: Studies in women have shown an increased risk of human immunodeficiency virus (HIV) acquisition with prior human papilloma virus (HPV) infection; however, few studies have been conducted among men. Our objective was to assess whether HPV-related external genital lesions (EGLs) increase risk of HIV seroconversion among men.Methods: A total of 1379 HIV-negative men aged 18 to 70 years from the United States, Mexico, and Brazil were followed for up to 7 years and underwent clinical examination for EGLs and blood draws every 6 months. Human immunodeficiency virus seroconversion was assessed in archived serum. Cox proportional hazards and marginal structural models assessed the association between EGL status and time to HIV seroconversion.Results: Twenty-nine participants HIV seroconverted during follow-up.Older age was associated with a lower hazard of HIV seroconversion. We found no significant difference in the risk of HIV seroconversion between men with and without EGLs (adjusted hazard ratio, 0.94; 95% confidence interval, 0.32-2.74). Stratified analyses focusing on men that have sex with men found no association between EGLs and HIV seroconversion risk (hazards ratio, 0.63; 95% confidence interval, 0.00-1.86).Conclusions: External genital lesions were not associated with higher risk for HIV seroconversion in this multinational population, although statistical power was limited as there were few HIV seroconversions. Results may differ in populations at higher risk for HIV.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.