Public health surveillance is the foundation of effective public health practice. Public health surveillance is defined as the ongoing systematic collection, analysis, and interpretation of data, closely integrated with the dissemination of these data to the public health practitioners, clinicians, and policy makers responsible for preventing and controlling disease and injury.1 Ideally, surveillance systems should support timely, efficient, flexible, scalable, and interoperable data acquisition, analysis, and dissemination. However, many current systems rely on disease-specific approaches that inhibit efficiency and interoperability (eg, manual data entry and data recoding that place a substantial burden on data partners) and use slow, inefficient, out-of-date technologies that no longer meet user needs for data management, analysis, visualization, and dissemination.2-4 Advances in information technology, data science, analytic methods, and information sharing provide an opportunity to substantially enhance surveillance. As a global leader in public health surveillance, the Centers for Disease Control and Prevention (CDC) is working with public health partners to transform and modernize CDC's surveillance systems and approaches. Here, we describe recent enhancements in surveillance data analysis and visualization, information sharing, and dissemination at CDC and identify the challenges ahead.
Objectives Federal open-data initiatives that promote increased sharing of federally collected data are important for transparency, data quality, trust, and relationships with the public and state, tribal, local, and territorial partners. These initiatives advance understanding of health conditions and diseases by providing data to researchers, scientists, and policymakers for analysis, collaboration, and use outside the Centers for Disease Control and Prevention (CDC), particularly for emerging conditions such as COVID-19, for which data needs are constantly evolving. Since the beginning of the pandemic, CDC has collected person-level, de-identified data from jurisdictions and currently has more than 8 million records. We describe how CDC designed and produces 2 de-identified public datasets from these collected data. Methods We included data elements based on usefulness, public request, and privacy implications; we suppressed some field values to reduce the risk of re-identification and exposure of confidential information. We created datasets and verified them for privacy and confidentiality by using data management platform analytic tools and R scripts. Results Unrestricted data are available to the public through Data.CDC.gov, and restricted data, with additional fields, are available with a data-use agreement through a private repository on GitHub.com. Practice Implications Enriched understanding of the available public data, the methods used to create these data, and the algorithms used to protect the privacy of de-identified people allow for improved data use. Automating data-generation procedures improves the volume and timeliness of sharing data.
Historically in public health surveillance systems have been designed and operated as registries targeting specific health issues. These systems included data from specifically targeted segments of the population, with data elements designed to answer specific programmatic questions. The result has been a collection of silo information systems that rarely can be used to address new needs without extensive revision, rework, or redesign.This decreases the opportunities for cross communication between programmatic areas, and limits the ability of public health professionals to examine issues that cross traditional programmatic boundaries.Emerging public health threats often require the coordination of stakeholders from different areas of public health practice. 2009 H1N1 influenza provided a similar challenge. In order to avoid the problems of silo information systems, the US Centers for Disease Control and Prevention's (CDC's) National Center for Public Health Informatics (NCPHI) and its partners began exploring and developing research for de-centralized information architecture through a Public Health Grid (PHGrid). Through systems research and the exploration of PHGrid capabilities, the CDC was able to develop a pilot project that enabled secure and timely exchange of information across multiple programmatic areas. This paper describes the process and results for the pilot project.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.