Child abuse and neglect are public health issues impacting communities throughout the United States. The broad adoption of electronic health records (EHR) in health care supports the development of machine learning–based models to help identify child abuse and neglect. Employing EHR data for child abuse and neglect detection raises several critical ethical considerations. This article applied a phenomenological approach to discuss and provide recommendations for key ethical issues related to machine learning–based risk models development and evaluation: (1) biases in the data; (2) clinical documentation system design issues; (3) lack of centralized evidence base for child abuse and neglect; (4) lack of “gold standard “in assessment and diagnosis of child abuse and neglect; (5) challenges in evaluation of risk prediction performance; (6) challenges in testing predictive models in practice; and (7) challenges in presentation of machine learning–based prediction to clinicians and patients. We provide recommended solutions to each of the 7 ethical challenges and identify several areas for further policy and research.
Objective
The study provides considerations for generating a phenotype of child abuse and neglect in Emergency Departments (ED) using secondary data from electronic health records (EHR). Implications will be provided for racial bias reduction and the development of further decision support tools to assist in identifying child abuse and neglect.
Materials and Methods
We conducted a qualitative study using in-depth interviews with 20 pediatric clinicians working in a single pediatric ED to gain insights about generating an EHR-based phenotype to identify children at risk for abuse and neglect.
Results
Three central themes emerged from the interviews: (1) Challenges in diagnosing child abuse and neglect, (2) Health Discipline Differences in Documentation Styles in EHR, and (3) Identification of potential racial bias through documentation.
Discussion
Our findings highlight important considerations for generating a phenotype for child abuse and neglect using EHR data. First, information-related challenges include lack of proper previous visit history due to limited information exchanges and scattered documentation within EHRs. Second, there are differences in documentation styles by health disciplines, and clinicians tend to document abuse in different document types within EHRs. Finally, documentation can help identify potential racial bias in suspicion of child abuse and neglect by revealing potential discrepancies in quality of care, and in the language used to document abuse and neglect.
Conclusions
Our findings highlight challenges in building an EHR-based risk phenotype for child abuse and neglect. Further research is needed to validate these findings and integrate them into creation of an EHR-based risk phenotype.
The prevalence of patients who are Incapacitated with No Evident Advance Directives or Surrogates (INEADS) remains unknown because such data are not routinely captured in structured electronic health records. This study sought to develop and validate a natural language processing (NLP) algorithm to identify information related to being INEADS from clinical notes. We used a publicly available dataset of critical care patients from 2001 through 2012 at a United States academic medical center, which contained 418,393 relevant clinical notes for 23,904 adult admissions. We developed 17 subcategories indicating reduced or elevated potential for being INEADS, and created a vocabulary of terms and expressions within each. We used an NLP application to create a language model and expand these vocabularies. The NLP algorithm was validated against gold standard manual review of 300 notes and showed good performance overall (F-score = 0.83). More than 80% of admissions had notes containing information in at least one subcategory. Thirty percent (n = 7,134) contained at least one of five social subcategories indicating elevated potential for being INEADS, and <1% (n = 81) contained at least four, which we classified as high likelihood of being INEADS. Among these, n = 8 admissions had no subcategory indicating reduced likelihood of being INEADS, and appeared to meet the definition of INEADS following manual review. Among the remaining n = 73 who had at least one subcategory indicating reduced likelihood of being INEADS, manual review of a 10% sample showed that most did not appear to be INEADS. Compared with the full cohort, the high likelihood group was significantly more likely to die during hospitalization and within four years, to have Medicaid, to have an emergency admission, and to be male. This investigation demonstrates potential for NLP to identify INEADS patients, and may inform interventions to enhance advance care planning for patients who lack social support.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.