Objectives To compare breadth of condition coverage, accuracy of suggested conditions and appropriateness of urgency advice of 8 popular symptom assessment apps with each other and with 7 General Practitioners.Design Clinical vignettes study.Setting 200 clinical vignettes representing real-world scenarios in primary care.Intervention/comparator Condition coverage, suggested condition accuracy, and urgency advice performance was measured against the vignettes' gold-standard diagnoses and triage level.
Primary outcome measuresOutcomes included (i) proportion of conditions "covered" by an app, i.e. not excluded because the patient was too young/old, pregnant, or comorbid, (ii) proportion of vignettes in which the correct primary diagnosis was amongst the top 3 conditions suggested, : medRxiv preprint and, (iii) proportion of "safe" urgency level advice (i.e. at gold standard level, more conservative, or no more than one level less conservative).Results Condition-suggestion coverage was highly variable, with some apps not offering a suggestion for many users: in alphabetical order, Ada: 99.0%; Babylon: 51.5%; Buoy: 88.5%; K Health: 74.5%; Mediktor: 80.5%; Symptomate: 61.5%; Your.MD: 64.5%. The top-3 suggestion accuracy (M3) of GPs was on average 82.1±5.2%. For the apps it was -Ada: 70.5%; Babylon: 32.0%; Buoy: 43.0%; K Health: 36.0%; Mediktor: 36.0%; Symptomate: 27.5%; WebMD: 35.5%;Your.MD: 23.5%. Some apps exclude certain user groups (e.g. younger users) or certain conditions -for these apps condition-suggestion performance is generally greater with exclusion of these vignettes. For safe urgency advice, tested GPs had an average of 97.0±2.5%. For the vignettes with advice provided, only three apps had safety performance within 1 S.D. of the GPs (mean) -Ada: 97.0%; Babylon: 95.1%; Symptomate: 97.8%. One app had a safety performance within 2 S.D.s of GPs -Your.MD: 92.6%. Three apps had a safety performance outside 2 S.D.s of GPs -Buoy: 80.0% (p<0.001); K Health: 81.3% (p<0.001); Mediktor: 87.3% (p=1.3⨉10-3).
ConclusionsThe utility of digital symptom assessment apps relies upon coverage, accuracy, and safety. While no digital tool outperformed GPs, some came close, and the nature of iterative improvements to software offers scalable improvements to care.