Accuracy of a Popular Online Symptom Checker for Ophthalmic Diagnoses

Shen, Carl; Nguyen, Michael; Gregor, Alexander; Isaza, Gloria; Beattie, Anne

doi:10.1001/jamaophthalmol.2019.0571

Cited by 49 publications

(54 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This study has gone further than the Semigran study in that it has attempted to contextualise the risk averse behaviour that has previously been seen in this and other studies [9][10][11][12] .This study demonstrates that individuals are being recommended to access services that their symptoms do not warrant by many of the symptom checkers assessed, potentially putting additional pressure on resources and adding undue worry on individuals that they must seek medical care. Although perhaps not surprising given the increasingly litigious nature of healthcare on both sides of the Atlantic, this is a notable concern.…”

Section: Discussionmentioning

confidence: 86%

“…Several previous studies 6-7 into the effectiveness of algorithmic performance have found deficiencies in the diagnostic capabilities and a cautious approach to triage. However, only one in 2015 (Semigran et al 8 ) examined multiple presentations and conditions; the others focussed on single condition studies such as those examining system performance for cervical myelopathy, inflammatory arthritis, HIV / Hepatitis C and ophthalmic conditions 9-12 . Given the refinement of existing models and the new entrants into the market since the 2015 study, the current clinical performance of these systems remains unknown.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Accuracy of online symptom checkers and the potential impact on service utilisation

Ceney¹,

Tolond²,

Glowinski³

et al. 2020

Preprint

View full text Add to dashboard Cite

Objectives: The aims of this study are firstly to investigate the diagnostic and triage performance of symptom checkers, secondly to assess their potential impact on healthcare utilisation and thirdly to investigate for variation in performance between systems. Setting: Publicly available symptom checkers Participants: Publicly available symptom-checkers were identified. A standardised set of 50 clinical vignettes was developed and systematically run through each system by a non-clinical researcher. Primary and secondary outcome measures: System accuracy was assessed by measuring the percentage of times the correct diagnosis was a) listed first, b) within the top five diagnoses listed and c) listed at all. The safety of the disposition advice was assessed by comparing it with national guidelines for each vignette. Results: Twelve tools were identified and included. Mean diagnostic accuracy of the systems was poor, with the correct diagnosis being listed first on 37.7% (Range 22.2 to 72.0%) of occasions and present in the top five diagnoses on 51.0% (Range 22.2 to 84.0%). 51.0% of systems suggested additional resource utilisation above that recommended by national guidelines (range 18.0% to 61.2%). Both diagnostic accuracy and appropriate resource recommendation varied substantially between systems. Conclusions: There is wide variation in performance between available symptom checkers and overall performance is significantly below what would be accepted in any other medical field, though some do achieve a good level of accuracy and safety of disposition. External validation and regulation are urgently required to ensure these public facing tools are safe.

show abstract

Section: Discussionmentioning

confidence: 86%

Section: Introductionmentioning

confidence: 99%

Accuracy of online symptom checkers and the potential impact on service utilisation

Ceney¹,

Tolond²,

Glowinski³

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…The reason for lower Mediktor performance in the current study compared with the study in Moreno Barriga et al 7 is not known but it may be related to a different range of conditions or difficulty level than the non-urgent emergency cases presenting to the ED—for example—the vignettes in this study contain many true emergency cases and also many GP or pharmacy/treat-at-home cases which would not be represented by the ED patients included in Moreno Barriga et al 7 . In 2017, a 42-vignette evaluation of WebMD 28 determined its accuracy for ophthalmic condition suggestion: M1 was 26.0% and M3 was 38.0%. Urgency advice based on the top diagnosis was appropriate in 39.0% of emergency cases and 88.0% of non-emergency cases.…”

Section: Discussionmentioning

confidence: 99%

How accurate are digital symptom assessment apps for suggesting conditions and urgency advice? A clinical vignettes comparison to GPs

Gilbert¹,

Mehl²,

Baluch³

et al. 2020

BMJ Open

129

205

View full text Add to dashboard Cite

ObjectivesTo compare breadth of condition coverage, accuracy of suggested conditions and appropriateness of urgency advice of eight popular symptom assessment apps.DesignVignettes study.Setting200 primary care vignettes.Intervention/comparatorFor eight apps and seven general practitioners (GPs): breadth of coverage and condition-suggestion and urgency advice accuracy measured against the vignettes’ gold-standard.Primary outcome measures(1) Proportion of conditions ‘covered’ by an app, that is, not excluded because the user was too young/old or pregnant, or not modelled; (2) proportion of vignettes with the correct primary diagnosis among the top 3 conditions suggested; (3) proportion of ‘safe’ urgency advice (ie, at gold standard level, more conservative, or no more than one level less conservative).ResultsCondition-suggestion coverage was highly variable, with some apps not offering a suggestion for many users: in alphabetical order, Ada: 99.0%; Babylon: 51.5%; Buoy: 88.5%; K Health: 74.5%; Mediktor: 80.5%; Symptomate: 61.5%; Your.MD: 64.5%; WebMD: 93.0%. Top-3 suggestion accuracy was GPs (average): 82.1%±5.2%; Ada: 70.5%; Babylon: 32.0%; Buoy: 43.0%; K Health: 36.0%; Mediktor: 36.0%; Symptomate: 27.5%; WebMD: 35.5%; Your.MD: 23.5%. Some apps excluded certain user demographics or conditions and their performance was generally greater with the exclusion of corresponding vignettes. For safe urgency advice, tested GPs had an average of 97.0%±2.5%. For the vignettes with advice provided, only three apps had safety performance within 1 SD of the GPs—Ada: 97.0%; Babylon: 95.1%; Symptomate: 97.8%. One app had a safety performance within 2 SDs of GPs—Your.MD: 92.6%. Three apps had a safety performance outside 2 SDs of GPs—Buoy: 80.0% (p<0.001); K Health: 81.3% (p<0.001); Mediktor: 87.3% (p=1.3×10-3).ConclusionsThe utility of digital symptom assessment apps relies on coverage, accuracy and safety. While no digital tool outperformed GPs, some came close, and the nature of iterative improvements to software offers scalable improvements to care.

show abstract

“…The reason for lower Mediktor performance in the current study compared to [7] is not known but it may be related to a different range of conditions or difficulty level than the non-urgent emergency cases presenting to the ED - for example - the vignettes in this study contain many true emergency cases and also many GP or pharmacy/treat-at-home cases which would not be represented by the ED patients included in [7]. In a 2017 42-vignette evaluation of WebMD, [28] determined its accuracy for ophthalmic condition suggestion: M1 was 26.0% and M3 was 38.0%. Urgency advice based on the top diagnosis was appropriate in 39.0% of emergency cases and 88.0% non-emergency cases.…”

Section: Discussionmentioning

confidence: 99%

Original research: How accurate are digital symptom assessment apps for suggesting conditions and urgency advice?: a clinical vignettes comparison to GPs

Gilbert¹,

Mehl²,

Baluch³

et al. 2020

Preprint

View full text Add to dashboard Cite

Objectives To compare breadth of condition coverage, accuracy of suggested conditions and appropriateness of urgency advice of 8 popular symptom assessment apps with each other and with 7 General Practitioners.Design Clinical vignettes study.Setting 200 clinical vignettes representing real-world scenarios in primary care.Intervention/comparator Condition coverage, suggested condition accuracy, and urgency advice performance was measured against the vignettes' gold-standard diagnoses and triage level. Primary outcome measuresOutcomes included (i) proportion of conditions "covered" by an app, i.e. not excluded because the patient was too young/old, pregnant, or comorbid, (ii) proportion of vignettes in which the correct primary diagnosis was amongst the top 3 conditions suggested, : medRxiv preprint and, (iii) proportion of "safe" urgency level advice (i.e. at gold standard level, more conservative, or no more than one level less conservative).Results Condition-suggestion coverage was highly variable, with some apps not offering a suggestion for many users: in alphabetical order, Ada: 99.0%; Babylon: 51.5%; Buoy: 88.5%; K Health: 74.5%; Mediktor: 80.5%; Symptomate: 61.5%; Your.MD: 64.5%. The top-3 suggestion accuracy (M3) of GPs was on average 82.1±5.2%. For the apps it was -Ada: 70.5%; Babylon: 32.0%; Buoy: 43.0%; K Health: 36.0%; Mediktor: 36.0%; Symptomate: 27.5%; WebMD: 35.5%;Your.MD: 23.5%. Some apps exclude certain user groups (e.g. younger users) or certain conditions -for these apps condition-suggestion performance is generally greater with exclusion of these vignettes. For safe urgency advice, tested GPs had an average of 97.0±2.5%. For the vignettes with advice provided, only three apps had safety performance within 1 S.D. of the GPs (mean) -Ada: 97.0%; Babylon: 95.1%; Symptomate: 97.8%. One app had a safety performance within 2 S.D.s of GPs -Your.MD: 92.6%. Three apps had a safety performance outside 2 S.D.s of GPs -Buoy: 80.0% (p<0.001); K Health: 81.3% (p<0.001); Mediktor: 87.3% (p=1.3⨉10-3). ConclusionsThe utility of digital symptom assessment apps relies upon coverage, accuracy, and safety. While no digital tool outperformed GPs, some came close, and the nature of iterative improvements to software offers scalable improvements to care.

show abstract

Accuracy of a Popular Online Symptom Checker for Ophthalmic Diagnoses

Cited by 49 publications

References 12 publications

Accuracy of online symptom checkers and the potential impact on service utilisation

Accuracy of online symptom checkers and the potential impact on service utilisation

How accurate are digital symptom assessment apps for suggesting conditions and urgency advice? A clinical vignettes comparison to GPs

Original research: How accurate are digital symptom assessment apps for suggesting conditions and urgency advice?: a clinical vignettes comparison to GPs

Contact Info

Product

Resources

About