2023
DOI: 10.1101/2023.11.19.23298727
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ChatGPT for assessing risk of bias of randomized trials using the RoB 2.0 tool: A methods study

Tyler Pitre,
Tanvir Jassal,
Jhalok Ronjan Talukdar
et al.

Abstract: BackgroundThe assessment of risk of bias is a critical component of systematic review methods. Assessing risk of bias, however, can be time- and resource-intensive. AI-based solutions may increase efficiency and reduce burden.ObjectiveTo evaluate the reliability of ChatGPT for performing risk of bias assessments of randomized trials.MethodsWe sampled recently published Cochrane systematic reviews of medical interventions (up to October 2023) that included randomized controlled trials and assessed risk of bias … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 56 publications
(79 reference statements)
0
2
0
Order By: Relevance
“…Just recently, other authors have used LLM to conduct RoB assessment, with mixed results. Pitre et al (34) found comparably low agreement between ChatGPT-4 and Cochrane authors when assessing RoB of 157 RCTs from 34 Cochrane Reviews using RoB 2 (Cohen's κ of 0.16 for the overall assessment). Testing the use of ChatGPT (GPT-4) for RoB assessment of non-randomized studies of intervention using ROBINS-I (65), Hasan et al (35) also obtained only slight agreement (Cohen's κ of 0.13 for the overall assessment).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Just recently, other authors have used LLM to conduct RoB assessment, with mixed results. Pitre et al (34) found comparably low agreement between ChatGPT-4 and Cochrane authors when assessing RoB of 157 RCTs from 34 Cochrane Reviews using RoB 2 (Cohen's κ of 0.16 for the overall assessment). Testing the use of ChatGPT (GPT-4) for RoB assessment of non-randomized studies of intervention using ROBINS-I (65), Hasan et al (35) also obtained only slight agreement (Cohen's κ of 0.13 for the overall assessment).…”
Section: Discussionmentioning
confidence: 99%
“…Currently, there are very limited methods to support RoB assessment using ML (5). However, also using ChatGPT alone for RoB assessments seems not recommendable, neither for RCTs (33,34) nor for non-randomized studies of interventions (35), due to limited agreement in RoB judgements between ChatGPT and humans.…”
Section: Introductionmentioning
confidence: 99%
“…33 Researchers are exploring the capabilities of GPT-4 in various scholarly tasks, 34 including reviewing scientific papers, [35][36][37] implementing edits based on reviewer comments, 38 and systematic review tasks, such as article screening, data extraction, 39,40 and assessing the risk of bias in included studies. 41 While GPT-4 has shown satisfactory performance in some of these tasks, the results in others have been less than optimal.…”
Section: Detecting Changes In Outcomesmentioning
confidence: 99%
“…with an estimated review time of 10-15 minutes per trial. However, automated tools such as RobotReviewer can streamline the extraction and evaluation process in batches [51][52][53], improving efficiency-though manual verification is still necessary. Additionally, chatbots based on LLMs can aid in risk of bias assessment (see Figure S8), and studies indicate that their accuracy is comparable to human evaluations [23].…”
Section: Assess the Risk Of Biasmentioning
confidence: 99%
“…For randomized controlled trials, tools such as Risk of Bias (RoB) [ 62 ] or its updated version RoB 2 [ 63 ] are typically used, with an estimated review time of 10-15 minutes per trial. However, automated tools such as RobotReviewer can streamline the extraction and evaluation process in batches [ 51 - 53 ], thereby improving efficiency, although manual verification is still necessary. Additionally, chatbots based on LLMs can aid in risk of bias assessment (see Multimedia Appendix 8 ), and their accuracy appears to be comparable to that of human evaluations [ 23 ].…”
Section: Potential Roles Of Llms In Producing Systematic Reviews and ...mentioning
confidence: 99%