BACKGROUND
Communication barriers in healthcare, particularly between physicians and patients with different linguistic and cultural backgrounds, negatively impact patient care. Studies show that language and cultural disparities contribute to health outcome disparities, particularly in the African American community. Patient-physician language concordance is linked to better health outcomes, highlighting the need for healthcare systems to consider patients' unique linguistic and cultural backgrounds.
OBJECTIVE
This study aims to investigate Large Language Models (LLMs), specifically GPT-4, in simulating patients who speak African American Vernacular English (AAVE). By assessing GPT-4's capability to mimic AAVE, the study seeks to bridge the linguistic and cultural communication gaps in healthcare, leading to a more culturally sensitive and inclusive healthcare system.
METHODS
The study involved simulating patient-physician interactions using GPT-4. We crafted prompts incorporating medical cases, demographic variables, and linguistic features, progressively increasing complexity. The prompts were based on scenarios from the United States Medical Licensing Examination (USMLE) Computer-Based Case Simulations (CCS). Diagnostic questions formulated by healthcare professionals were posed to the simulated patients, and responses were analyzed for AAVE linguistic features.
RESULTS
Our research indicates that GPT-4 consistently exhibits AAVE characteristics in response to various prompts. Notably, the most comprehensive prompt (CompP) – which integrates the medical case, demographic variable, and linguistic features – is particularly effective in eliciting AAVE features in the GPT-4's responses. Interestingly, prompt that solely include a demographic variable (DemoP) is more effective than the one with explicit linguistic feature details (LingP) in eliciting phonological features. This suggests an inherent association in GPT-4 between the African American demographic and specific phonological attributes. Furthermore, GPT-4 can generate 'out-of-list features', which are linguistic behaviors not explicitly requested in the prompts. However, the GPT-4 exhibits limitations in simulating certain AAVE features. These limitations are particularly evident in constructing questions involving specific inversion rules, existential and locative constructions, and in the use of unique AAVE lexical items.
CONCLUSIONS
This study underscores GPT-4's proficiency in simulating linguistic behaviors associated with specific demographic groups, particularly AAVE. Such capability is vital for bridging linguistic gaps in healthcare and enhancing communication. Our findings suggest the potential of AI systems, like GPT-4, to serve as practical training tools for medical professionals, improving their interaction with diverse patient populations. Future research should include a broader range of demographic and sociolect factors, and focus on adapting medical terminology to various levels of patient health literacy. Developing customized language models, sensitive to linguistic and cultural nuances, could play a pivotal role in reducing communication barriers, leading to better patient outcomes and a more equitable healthcare system.