35Limited health literacy can be a barrier to healthcare delivery, but widespread classification of patient health 36 literacy is challenging. We applied natural language processing and machine learning on a large sample of 37 283,216 secure messages sent from 6,941 patients to their clinicians for this study to develop and validate 38 literacy profiles as indicators of patients' health literacy. All patients were participants in Kaiser Permanente 39 Northern California's DISTANCE Study. We created three literacy profiles, comparing performance of each 40 literacy profile against a gold standard of patient self-report. We also analyzed associations between the literacy 41 profiles and patient demographics, health outcomes and healthcare utilization. T-tests were used for numeric data 42 such as A1C, Charlson comorbidity index and healthcare utilization rates, and chi-square tests for categorical 43 data such as sex, race, continuous medication gaps and severe hypoglycemia. Literacy profiles varied in their test 44 characteristics, with C-statistics ranging from 0.61-0.74. Relationships between literacy profiles and health 45 outcomes revealed patterns consistent with previous health literacy research: patients identified via literacy 46 profiles as having limited health literacy were older and more likely minority; had poorer medication adherence represents the first successful attempt to use natural language processing and machine learning to measure health 49 literacy. Literacy profiles offer an automated and economical way to identify patients with limited health literacy 50 and a greater vulnerability to poor health outcomes. 51 52 BACKGROUND AND SIGNIFICANCE 53 An estimated 30.3 million people in the U.S. had diabetes mellitus (DM) in 2015 according to the Centers for 54 Disease Control and Prevention (2017). Like most chronic conditions, DM self-management can be complex and 55 requires that patients frequently communicate with healthcare providers. Health literacy (HL) is generally 56 defined as a patient's ability to obtain, process, comprehend and communicate basic health information [1, 2].
57DM patients with limited HL have a higher risk of poor health outcomes, including worse blood sugar control, 58 higher complication rates [3], and a greater incidence of hypoglycemia [4, 5]. Poor communication and sub-59 optimal adherence to medication may explain some of these disparities [6, 7]. Limited HL contributes to preventable suffering, more rapid decline in physical function [8] and related healthcare costs. Online patient 61 portals embedded within electronic health records (EHRs) are now being used widely to bridge in-person 62 encounters and providing support between visits by allowing patients and providers to communicate via secure 63 messages (SMs). The reach and effectiveness of online communication is likely heavily affected by patients' HL.
64Limited HL is a barrier to use of patient portals and impacts patients' evaluation of online health information [9].
65However, no research has harnessed S...