BackgroundThere are many benefits to open datasets. However, privacy concerns have hampered the widespread creation of open health data. There is a dearth of documented methods and case studies for the creation of public-use health data. We describe a new methodology for creating a longitudinal public health dataset in the context of the Heritage Health Prize (HHP). The HHP is a global data mining competition to predict, by using claims data, the number of days patients will be hospitalized in a subsequent year. The winner will be the team or individual with the most accurate model past a threshold accuracy, and will receive a US $3 million cash prize. HHP began on April 4, 2011, and ends on April 3, 2013.ObjectiveTo de-identify the claims data used in the HHP competition and ensure that it meets the requirements in the US Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule.MethodsWe defined a threshold risk consistent with the HIPAA Privacy Rule Safe Harbor standard for disclosing the competition dataset. Three plausible re-identification attacks that can be executed on these data were identified. For each attack the re-identification probability was evaluated. If it was deemed too high then a new de-identification algorithm was applied to reduce the risk to an acceptable level. We performed an actual evaluation of re-identification risk using simulated attacks and matching experiments to confirm the results of the de-identification and to test sensitivity to assumptions. The main metric used to evaluate re-identification risk was the probability that a record in the HHP data can be re-identified given an attempted attack.ResultsAn evaluation of the de-identified dataset estimated that the probability of re-identifying an individual was .0084, below the .05 probability threshold specified for the competition. The risk was robust to violations of our initial assumptions.ConclusionsIt was possible to ensure that the probability of re-identification for a large longitudinal dataset was acceptably low when it was released for a global user community in support of an analytics competition. This is an example of, and methodology for, achieving open data principles for longitudinal health data.
The home healthcare initiative is aimed to reduce readmission costs, transportation costs, and hospital medical errors, and to improve post hospitalization healthcare quality, and enhance patient home independency. Today, it is almost unimaginable to consider this initiative without information technology. Home healthcare robots are one of such emerging technologies. Several robots have been developed to facilitate home healthcare such as remote presence robots (e.g., RP2) and Paro. Most previous research in this area has focused on technology and implementation issues of home healthcare robots, but ignored the factors that influence their adoption. To address the limitation, the current research applied and extended the UTAUT model to the home healthcare domain. The model was tested using survey questionnaire. The empirical results not only confirmed the effects of some constructs from the original UTAUT model but also identified perceived security as a new factor that directly affects usage intention of home healthcare robots. In addition, effort expectancy did not show a direct effect but an indirect effect through performance expectancy on usage intention. Several practical and theoretical implications are also discussed.
Objective Development of systematic approaches for understanding and assessing data quality is becoming increasingly important as the volume and utilization of health data steadily increases. In this study, a taxonomy of data defects was developed and utilized when automatically detecting defects to assess Medicaid data quality maintained by one of the states in the United States. Materials and Methods There were more than 2.23 million rows and 32 million cells in the Medicaid data examined. The taxonomy was developed through document review, descriptive data analysis, and literature review. A software program was created to automatically detect defects by using a set of constraints whose development was facilitated by the taxonomy. Results Five major categories and seventeen subcategories of defects were identified. The major categories are missingness, incorrectness, syntax violation, semantic violation, and duplicity. More than 3 million defects were detected indicating substantial problems with data quality. Defect density exceeded 10% in five tables. The majority of the data defects belonged to format mismatch, invalid code, dependency-contract violation, and implausible value types. Such contextual knowledge can support prioritized quality improvement initiatives for the Medicaid data studied. Conclusions This research took the initial steps to understand the types of data defects and detect defects in large healthcare datasets. The results generally suggest that healthcare organizations can potentially benefit from focusing on data quality improvement. For those purposes, the taxonomy developed and the approach followed in this study can be adopted.
BackgroundThe Canadian Institute for Health Information (CIHI) collects hospital discharge abstract data (DAD) from Canadian provinces and territories. There are many demands for the disclosure of this data for research and analysis to inform policy making. To expedite the disclosure of data for some of these purposes, the construction of a DAD public use microdata file (PUMF) was considered. Such purposes include: confirming some published results, providing broader feedback to CIHI to improve data quality, training students and fellows, providing an easily accessible data set for researchers to prepare for analyses on the full DAD data set, and serve as a large health data set for computer scientists and statisticians to evaluate analysis and data mining techniques. The objective of this study was to measure the probability of re-identification for records in a PUMF, and to de-identify a national DAD PUMF consisting of 10% of records.MethodsPlausible attacks on a PUMF were evaluated. Based on these attacks, the 2008-2009 national DAD was de-identified. A new algorithm was developed to minimize the amount of suppression while maximizing the precision of the data. The acceptable threshold for the probability of correct re-identification of a record was set at between 0.04 and 0.05. Information loss was measured in terms of the extent of suppression and entropy.ResultsTwo different PUMF files were produced, one with geographic information, and one with no geographic information but more clinical information. At a threshold of 0.05, the maximum proportion of records with the diagnosis code suppressed was 20%, but these suppressions represented only 8-9% of all values in the DAD. Our suppression algorithm has less information loss than a more traditional approach to suppression. Smaller regions, patients with longer stays, and age groups that are infrequently admitted to hospitals tend to be the ones with the highest rates of suppression.ConclusionsThe strategies we used to maximize data utility and minimize information loss can result in a PUMF that would be useful for the specific purposes noted earlier. However, to create a more detailed file with less information loss suitable for more complex health services research, the risk would need to be mitigated by requiring the data recipient to commit to a data sharing agreement.
Introduction Computable biomedical knowledge artifacts (CBKs) are digital objects conveying biomedical knowledge in machine‐interpretable structures. As more CBKs are produced and their complexity increases, the value obtained from sharing CBKs grows. Mobilizing CBKs and sharing them widely can only be achieved if the CBKs are findable, accessible, interoperable, reusable, and trustable (FAIR+T). To help mobilize CBKs, we describe our efforts to outline metadata categories to make CBKs FAIR+T. Methods We examined the literature regarding metadata with the potential to make digital artifacts FAIR+T. We also examined metadata available online today for actual CBKs of 12 different types. With iterative refinement, we came to a consensus on key categories of metadata that, when taken together, can make CBKs FAIR+T. We use subject‐predicate‐object triples to more clearly differentiate metadata categories. Results We defined 13 categories of CBK metadata most relevant to making CBKs FAIR+T. Eleven of these categories (type, domain, purpose, identification, location, CBK‐to‐CBK relationships, technical, authorization and rights management, provenance, evidential basis, and evidence from use metadata) are evident today where CBKs are stored online. Two additional categories (preservation and integrity metadata) were not evident in our examples. We provide a research agenda to guide further study and development of these and other metadata categories. Conclusion A wide variety of metadata elements in various categories is needed to make CBKs FAIR+T. More work is needed to develop a common framework for CBK metadata that can make CBKs FAIR+T for all stakeholders.
KeywordsHome health agencies, health information technology, quality of care, workflow, information management SummaryObjectives: To help manage the risk of falls in home care, this study aimed to (i) identify home care clinicians' information needs and how they manage missing or inaccurate data, (ii) identify problems that impact effectiveness and efficiency associated with retaining, exchanging, or processing information about fall risks in existing workflows and currently adopted health information technology (IT) solutions, and (iii) offer informatics-based recommendations to improve fall risk management interventions. Methods: A case study was carried out in a single not-for-profit suburban Medicare-certified home health agency with three branches. Qualitative data were collected over a six month period through observations, semi-structured interviews, and focus groups. The Framework method was used for analysis. Maximum variation sampling was adopted to recruit a diverse sample of clinicians. Results: Overall, the information needs for fall risk management were categorized into physiological, care delivery, educational, social, environmental, and administrative domains. Examples include a brief fall-related patient history, weight-bearing status, medications that affect balance, availability of caregivers at home, and the influence of patients' cultures on fall management interventions. The unavailability and inaccuracy of critical information related to fall risks can delay necessary therapeutic services aimed at reducing patients' risk for falling and thereby jeopardizing their safety. Currently adopted IT solutions did not adequately accommodate data related to fall risk management. Conclusion:The results highlight the essential information for fall risk management in home care. Home care workflows and health IT solutions must effectively and efficiently retain, exchange, and process information necessary for fall risk management. Interoperability and integration of the various health IT solutions to make data sharing accessible to all clinicians is critical for fall risk management. Findings from this study can help home health agencies better understand their information needs to manage fall risks. BackgroundHome healthcare, referred to as home care henceforth, is defined as episodic and intermittent secondary care services provided to home-bound patients in their homes. Serving mostly the elderly [1], home care involves skilled care services provided by nurses, physical therapists, occupational therapists, speech therapists, social workers, and home aides [2]. Home care is a critical component in the continuum of care for many patients who require recuperative and rehabilitative services after hospital discharges [3][4][5][6]. It enables patients to achieve better recovery, gain strength, regain functionality, and become independent more quickly [7]. Home care is an important component of the overall healthcare industry in the United States (US) with a projected increase in its utilization and expen...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.