COVID-19 transmission models have conferred great value in informing public health understanding, planning, and response. However, the pandemic also demonstrated the infeasibility of basing public health decision-making on transmission models with pre-set assumptions. No matter how favourably evidenced when built, a model with fixed assumptions is challenged by numerous factors that are difficult to predict. Ongoing planning associated with rolling back and re-instituting measures, initiating surge planning, and issuing public health advisories can benefit from approaches that allow state estimates for transmission models to be continuously updated in light of unfolding time series. A model being continuously regrounded by empirical data in this way can provide a consistent, integrated depiction of the evolving underlying epidemiology and acute care demand, offer the ability to project forward such a depiction in a fashion suitable for triggering the deployment of acute care surge capacity or public health measures, support quantative evaluation of tradeoffs associated with prospective interventions in light of the latest estimates of the underlying epidemiology. We describe here the design, implementation and multi-year daily use for public health and clinical support decision-making of a particle filtered COVID-19 compartmental model, which served Canadian federal and provincial governments via regular reporting starting in June 2020. The use of the Bayesian Sequential Monte Carlo algorithm of Particle Filtering allows the model to be re-grounded daily and adapt to new trends within daily incoming data – including test volumes and positivity rates, endogenous and travel-related cases, hospital census and admissions flows, daily counts dose-specific vaccinations administered, measured concentration of SARS-CoV-2 in wastewater, and mortality. Important model outputs include estimates (via sampling) of the count of undiagnosed infectives, the count of individuals at different stages of the natural history of frankly and pauci-symptomatic infection, the current force of infection, effective reproductive number, and current and cumulative infection prevalence. Following a brief description of model design, we describe how the machine learning algorithm of particle filtering is used to continually reground estimates of dynamic model state, support probabilistic model projection of epidemiology and health system capacity utilization and service demand and probabilistically evaluate trade-offs between potential intervention scenarios. We further note aspects of model use in practice as an effective reporting tool in a manner that is parameterized by jurisdiction, including support of a scripting pipeline that permits a fully automated reporting pipeline other than security-restricted new data retrieval, including automated model deployment, data validity checks, and automatic post-scenario scripting and reporting. As demonstrated by this multi-year deployment of Bayesian machine learning algorithm of particle filtering to provide industrial-strength reporting to inform public health decision making across Canada, such methods offer strong support for evidence-based public health decision making informed by ever-current articulated transmission models whose probabilistic state and parameter estimates are continually regrounded by diverse data streams.
Identifying the causal relationships between subjects or variables remains an important problem across various scientific fields. This is particularly important but challenging in complex systems, such as those involving human behavior, sociotechnical contexts, and natural ecosystems. By exploiting state space reconstruction via lagged embedding of time series, convergent cross mapping (CCM) serves as an important method for addressing this problem. While powerful, CCM is computationally costly; moreover, CCM results are highly sensitive to several parameter values. While best practice entails exploring a range of parameter settings when assessing casual relationships, the resulting computational burden can raise barriers to practical use, especially for long time series exhibiting weak causal linkages. We demonstrate here several means of accelerating CCM by harnessing the distributed Apache Spark platform. We characterize and report on results of several experiments with parallelized solutions that demonstrate high scalability and a capacity for over an order of magnitude performance improvement for the baseline configuration. Such economies in computation time can speed learning and robust identification of causal drivers in complex systems.
BackgroundThe use of social media data provides an opportunity to complement traditional influenza and COVID-19 surveillance methods for the detection and control of outbreaks and informing public health interventions.ObjectiveThe first aim of this study is to investigate the degree to which Twitter users disclose health experiences related to influenza and COVID-19 that could be indicative of recent plausible influenza cases or symptomatic COVID-19 infections. Second, we seek to use the Twitter datasets to train and evaluate the classification performance of Bidirectional Encoder Representations from Transformers (BERT) and variant language models in the context of influenza and COVID-19 infection detection.MethodsWe constructed two Twitter datasets using a keyword-based filtering approach on English-language tweets collected from December 2016 to December 2022 in Saskatchewan, Canada. The influenza-related dataset comprised tweets filtered with influenza-related keywords from December 13, 2016, to March 17, 2018, while the COVID-19 dataset comprised tweets filtered with COVID-19 symptom-related keywords from January 1, 2020, to June 22, 2021. The Twitter datasets were cleaned, and each tweet was annotated by at least two annotators as to whether it suggested recent plausible influenza cases or symptomatic COVID-19 cases. We then assessed the classification performance of pre-trained transformer-based language models, including BERT-base, BERT-large, RoBERTa-base, RoBERT-large, BERTweet-base, BERTweet-covid-base, BERTweet-large, and COVID-Twitter-BERT (CT-BERT) models, on each dataset. To address the notable class imbalance, we experimented with both oversampling and undersampling methods.ResultsThe influenza dataset had 1129 out of 6444 (17.5%) tweets annotated as suggesting recent plausible influenza cases. The COVID-19 dataset had 924 out of 11939 (7.7%) tweets annotated as inferring recent plausible COVID-19 cases. When compared against other language models on the COVID-19 dataset, CT-BERT performed the best, supporting the highest scores for recall (94.8%), F1(94.4%), and accuracy (94.6%). For the influenza dataset, BERTweet models exhibited better performance. Our results also showed that applying data balancing techniques such as oversampling or undersampling method did not lead to improved model performance.ConclusionsUtilizing domain-specific language models for monitoring users’ health experiences related to influenza and COVID-19 on social media shows improved classification performance and has the potential to supplement real-time disease surveillance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.