Juan M. Banda scite author profile

Observational research promises to complement experimental research by providing large, diverse populations that would be infeasible for an experiment. Observational research can test its own clinical hypotheses, and observational studies also can contribute to the design of experiments and inform the generalizability of experimental research. Understanding the diversity of populations and the variance in care is one component. In this study, the Observational Health Data Sciences and Informatics (OHDSI) collaboration created an international data network with 11 data sources from four countries, including electronic health records and administrative claims data on 250 million patients. All data were mapped to common data standards, patient privacy was maintained by using a distributed model, and results were aggregated centrally. Treatment pathways were elucidated for type 2 diabetes mellitus, hypertension, and depression. The pathways revealed that the world is moving toward more consistent therapy over time across diseases and across locations, but significant heterogeneity remains among sources, pointing to challenges in generalizing clinical trial results. Diabetes favored a single first-line medication, metformin, to a much greater extent than hypertension or depression. About 10% of diabetes and depression patients and almost 25% of hypertension patients followed a treatment pathway that was unique within the cohort. Aside from factors such as sample size and underlying population (academic medical center versus general population), electronic health records data and administrative claims data revealed similar results. Large-scale international observational research is feasible.observational research | data network | treatment pathways A learning health system (1) must systematically evaluate the effects of medical interventions to enable evidence-based medical decision-making. Randomized clinical trials serve as the cornerstone for causal evidence about medical products (2, 3), but evidence from these trials may be limited by an insufficient number of persons exposed, insufficient length of exposure, and inadequate coverage of the target population, factors that limit external generalizability. Observational studies can contribute to the larger goal of causal inference at three stages: (i) the design of experiments, such as determining what are the current therapies that should be compared with a new therapy; (ii) the direct testing of clinical hypotheses on observational data (4-8) using methods to correct for nonrandom treatment assignment as part of the effect estimation process; and (iii) better understanding of population characteristics to improve the extrapolation of both observational and experimental results to new groups.Without sufficiently broad databases available in the first stage, randomized trials are designed without explicit knowledge of actual disease status and treatment practice. Literature reviews are restricted to the population choices of previous investigations, and pilot studi...

show abstract

Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models

Banda

Seneviratne

Hernandez‐Boussard

et al. 2018

Annu. Rev. Biomed. Data Sci.

154

147

View full text Add to dashboard Cite

With the widespread adoption of electronic health records (EHRs), large repositories of structured and unstructured patient data are becoming available to conduct observational studies. Finding patients with specific conditions or outcomes, known as phenotyping, is one of the most fundamental research problems encountered when using these new EHR data. Phenotyping forms the basis of translational research, comparative effectiveness studies, clinical decision support, and population health analyses using routinely collected EHR data. We review the evolution of electronic phenotyping, from the early rule-based methods to the cutting edge of supervised and unsupervised machine learning models. We aim to cover the most influential papers in commensurate detail, with a focus on both methodology and implementation. Finally, future research directions are explored.

show abstract

A curated and standardized adverse drug event resource to accelerate drug safety research

et al. 2016

View full text Add to dashboard Cite

Identification of adverse drug reactions (ADRs) during the post-marketing phase is one of the most important goals of drug safety surveillance. Spontaneous reporting systems (SRS) data, which are the mainstay of traditional drug safety surveillance, are used for hypothesis generation and to validate the newer approaches. The publicly available US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) data requires substantial curation before they can be used appropriately, and applying different strategies for data cleaning and normalization can have material impact on analysis results. We provide a curated and standardized version of FAERS removing duplicate case records, applying standardized vocabularies with drug names mapped to RxNorm concepts and outcomes mapped to SNOMED-CT concepts, and pre-computed summary statistics about drug-outcome relationships for general consumption. This publicly available resource, along with the source code, will accelerate drug safety research by reducing the amount of time spent performing data management on the source FAERS reports, improving the quality of the underlying data, and enabling standardized analyses using common vocabularies.

show abstract

Risk of hydroxychloroquine alone and in combination with azithromycin in the treatment of rheumatoid arthritis: a multinational, retrospective study

Lane¹,

Weaver²,

Kostka³

et al. 2020

The Lancet Rheumatology

128

117

View full text Add to dashboard Cite

Summary Background Hydroxychloroquine, a drug commonly used in the treatment of rheumatoid arthritis, has received much negative publicity for adverse events associated with its authorisation for emergency use to treat patients with COVID-19 pneumonia. We studied the safety of hydroxychloroquine, alone and in combination with azithromycin, to determine the risk associated with its use in routine care in patients with rheumatoid arthritis. Methods In this multinational, retrospective study, new user cohort studies in patients with rheumatoid arthritis aged 18 years or older and initiating hydroxychloroquine were compared with those initiating sulfasalazine and followed up over 30 days, with 16 severe adverse events studied. Self-controlled case series were done to further establish safety in wider populations, and included all users of hydroxychloroquine regardless of rheumatoid arthritis status or indication. Separately, severe adverse events associated with hydroxychloroquine plus azithromycin (compared with hydroxychloroquine plus amoxicillin) were studied. Data comprised 14 sources of claims data or electronic medical records from Germany, Japan, the Netherlands, Spain, the UK, and the USA. Propensity score stratification and calibration using negative control outcomes were used to address confounding. Cox models were fitted to estimate calibrated hazard ratios (HRs) according to drug use. Estimates were pooled where the I 2 value was less than 0·4. Findings The study included 956 374 users of hydroxychloroquine, 310 350 users of sulfasalazine, 323 122 users of hydroxychloroquine plus azithromycin, and 351 956 users of hydroxychloroquine plus amoxicillin. No excess risk of severe adverse events was identified when 30-day hydroxychloroquine and sulfasalazine use were compared. Self-controlled case series confirmed these findings. However, long-term use of hydroxychloroquine appeared to be associated with increased cardiovascular mortality (calibrated HR 1·65 [95% CI 1·12–2·44]). Addition of azithromycin appeared to be associated with an increased risk of 30-day cardiovascular mortality (calibrated HR 2·19 [95% CI 1·22–3·95]), chest pain or angina (1·15 [1·05–1·26]), and heart failure (1·22 [1·02–1·45]). Interpretation Hydroxychloroquine treatment appears to have no increased risk in the short term among patients with rheumatoid arthritis, but in the long term it appears to be associated with excess cardiovascular mortality. The addition of azithromycin increases the risk of heart failure and cardiovascular mortality even in the short term. We call for careful consideration of the benefit–risk trade-off when counselling those on hydroxychloroquine treatment. Funding National Institute for Health Research (NIHR) Oxford Biomedical Research Centre, NIHR Senior Research Fellowship programme, US National Institutes of Health, US Depar...

show abstract

Learning statistical models of phenotypes using noisy labeled training data

et al. 2016

View full text Add to dashboard Cite

show abstract

A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration

et al. 2021

View full text Add to dashboard Cite

As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is being generated for medical, genetics, and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated on the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique worldwide event in biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 1.12 billion tweets, growing daily, related to COVID-19 chatter generated from 1 January 2020 to 27 June 2021 at the time of writing. This data source provides a freely available additional data source for researchers worldwide to conduct a wide and diverse number of research projects, such as epidemiological analyses, emotional and mental responses to social distancing measures, the identification of sources of misinformation, stratified measurement of sentiment towards the pandemic in near real time, among many others.

show abstract

Deep phenotyping of 34,128 adult patients hospitalised with COVID-19 in an international network study

et al. 2020

View full text Add to dashboard Cite

Comorbid conditions appear to be common among individuals hospitalised with coronavirus disease 2019 (COVID-19) but estimates of prevalence vary and little is known about the prior medication use of patients. Here, we describe the characteristics of adults hospitalised with COVID-19 and compare them with influenza patients. We include 34,128 (US: 8362, South Korea: 7341, Spain: 18,425) COVID-19 patients, summarising between 4811 and 11,643 unique aggregate characteristics. COVID-19 patients have been majority male in the US and Spain, but predominantly female in South Korea. Age profiles vary across data sources. Compared to 84,585 individuals hospitalised with influenza in 2014-19, COVID-19 patients have more typically been male, younger, and with fewer comorbidities and lower medication use. While protecting groups vulnerable to influenza is likely a useful starting point in the response to COVID-19, strategies will likely need to be broadened to reflect the particular characteristics of individuals being hospitalised with COVID-19.

show abstract

Precision screening for familial hypercholesterolaemia: a machine learning study applied to electronic health encounter data

Myers

Knowles

Staszak³

et al. 2019

The Lancet Digital Health

View full text Add to dashboard Cite

Background Cardiovascular outcomes for people with familial hypercholesterolaemia can be improved with diagnosis and medical management. However, 90% of individuals with familial hypercholesterolaemia remain undiagnosed in the USA. We aimed to accelerate early diagnosis and timely intervention for more than 1•3 million undiagnosed individuals with familial hypercholesterolaemia at high risk for early heart attacks and strokes by applying machine learning to large health-care encounter datasets. MethodsWe trained the FIND FH machine learning model using deidentified health-care encounter data, including procedure and diagnostic codes, prescriptions, and laboratory findings, from 939 clinically diagnosed individuals with familial hypercholesterolaemia (395 of whom had a molecular diagnosis) and 83 136 individuals presumed free of familial hypercholesterolaemia, sampled from four US institutions. The model was then applied to a national health-care encounter database (170 million individuals) and an integrated health-care delivery system dataset (174 000 individuals). Individuals used in model training and those evaluated by the model were required to have at least one cardiovascular disease risk factor (eg, hypertension, hypercholesterolaemia, or hyperlipidemia). A Health Insurance Portability and Accountability Act of 1996-compliant programme was developed to allow providers to receive identification of individuals likely to have familial hypercholesterolaemia in their practice. Findings Using a model with a measured precision (positive predictive value) of 0•85, recall (sensitivity) of 0•45, area under the precision-recall curve of 0•55, and area under the receiver operating characteristic curve of 0•89, we flagged 1 331 759 of 170 416 201 patients in the national database and 866 of 173 733 individuals in the health-care delivery system dataset as likely to have familial hypercholesterolaemia. Familial hypercholesterolaemia experts reviewed a sample of flagged individuals (45 from the national database and 103 from the health-care delivery system dataset) and applied clinical familial hypercholesterolaemia diagnostic criteria. Of those reviewed, 87% (95% Cl 73-100) in the national database and 77% (68-86) in the health-care delivery system dataset were categorised as having a high enough clinical suspicion of familial hypercholesterolaemia to warrant guideline-based clinical evaluation and treatment.Interpretation The FIND FH model successfully scans large, diverse, and disparate health-care encounter databases to identify individuals with familial hypercholesterolaemia. FundingThe FH Foundation funded this study. Support was received from Amgen, Sanofi, and Regeneron.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Juan M. Banda

Characterizing treatment pathways at scale using the OHDSI network

Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models

A curated and standardized adverse drug event resource to accelerate drug safety research

Risk of hydroxychloroquine alone and in combination with azithromycin in the treatment of rheumatoid arthritis: a multinational, retrospective study

Learning statistical models of phenotypes using noisy labeled training data

A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration

Deep phenotyping of 34,128 adult patients hospitalised with COVID-19 in an international network study

Precision screening for familial hypercholesterolaemia: a machine learning study applied to electronic health encounter data

Contact Info

Product

Resources

About