Using text analysis software to identify determinants of inappropriate clinical question reporting and diagnostic procedure referrals in Reggio Emilia, Italy
Abstract:Background
Inappropriate prescribing of diagnostic procedures leads to overdiagnosis, overtreatment and resource waste in healthcare systems. Effective strategies to measure and to overcome inappropriateness are essential to increasing the value and sustainability of care.
We aimed to describe the determinants of inappropriate reporting of the clinical question and of inappropriate imaging and endoscopy referrals through an analysis of general practitioners’ (GP) referral forms … Show more
“…Venturelli et al assessed the appropriateness of referrals for imaging and other diagnostic procedures by analysing the requests’ content using a commercial software package. 41 Their study’s focus was on classification results (appropriate vs. not appropriate) and not the applied method’s performance, which impeded comparison with our study. However, the study demonstrated the feasibility of using requests for assessing appropriateness and is an excellent example of a future application for transformer-based NLP.…”
Background Radiology requests and reports contain valuable information about diagnostic findings and indications, and transformer-based language models are promising for more accurate text classification. Methods In a retrospective study, 2256 radiologist-annotated radiology requests (8 classes) and reports (10 classes) were divided into training and testing datasets (90% and 10%, respectively) and used to train 32 models. Performance metrics were compared by model type (LSTM, Bertje, RobBERT, BERT-clinical, BERT-multilingual, BERT-base), text length, data prevalence, and training strategy. The best models were used to predict the remaining 40,873 cases’ categories of the datasets of requests and reports. Results The RobBERT model performed the best after 4000 training iterations, resulting in AUC values ranging from 0.808 [95% CI (0.757–0.859)] to 0.976 [95% CI (0.956–0.996)] for the requests and 0.746 [95% CI (0.689–0.802)] to 1.0 [95% CI (1.0–1.0)] for the reports. The AUC for the classification of normal reports was 0.95 [95% CI (0.922–0.979)]. The predicted data demonstrated variability of both diagnostic yield for various request classes and request patterns related to COVID-19 hospital admission data. Conclusion Transformer-based natural language processing is feasible for the multilabel classification of chest imaging request and report items. Diagnostic yield varies with the information in the requests.
“…Venturelli et al assessed the appropriateness of referrals for imaging and other diagnostic procedures by analysing the requests’ content using a commercial software package. 41 Their study’s focus was on classification results (appropriate vs. not appropriate) and not the applied method’s performance, which impeded comparison with our study. However, the study demonstrated the feasibility of using requests for assessing appropriateness and is an excellent example of a future application for transformer-based NLP.…”
Background Radiology requests and reports contain valuable information about diagnostic findings and indications, and transformer-based language models are promising for more accurate text classification. Methods In a retrospective study, 2256 radiologist-annotated radiology requests (8 classes) and reports (10 classes) were divided into training and testing datasets (90% and 10%, respectively) and used to train 32 models. Performance metrics were compared by model type (LSTM, Bertje, RobBERT, BERT-clinical, BERT-multilingual, BERT-base), text length, data prevalence, and training strategy. The best models were used to predict the remaining 40,873 cases’ categories of the datasets of requests and reports. Results The RobBERT model performed the best after 4000 training iterations, resulting in AUC values ranging from 0.808 [95% CI (0.757–0.859)] to 0.976 [95% CI (0.956–0.996)] for the requests and 0.746 [95% CI (0.689–0.802)] to 1.0 [95% CI (1.0–1.0)] for the reports. The AUC for the classification of normal reports was 0.95 [95% CI (0.922–0.979)]. The predicted data demonstrated variability of both diagnostic yield for various request classes and request patterns related to COVID-19 hospital admission data. Conclusion Transformer-based natural language processing is feasible for the multilabel classification of chest imaging request and report items. Diagnostic yield varies with the information in the requests.
“…This study supports these concepts. Other studies suggest and explore new techniques, such as text analysis software and other types of automated language processing, in evaluations [ 18 – 20 ]. This could be a way forward in the future.…”
Objectives
The numbers of computed tomography (CT) and magnetic resonance imaging (MRI) examinations per capita continue to increase in Sweden and in other parts of Europe. The appropriateness of CT and MRI examinations was audited using established European appropriateness criteria. Alternative modalities were also explored. The results were compared with those of a previous study performed in Sweden.
Methods
A semi-automatic retrospective evaluation of referrals from examinations performed in four healthcare regions using the European appropriateness criteria in ESR iGuide was undertaken. The clinical indications from a total of 13,075 referrals were assessed against these criteria. The ESR iGuide was used to identify alternative modalities resulting in a higher degree of appropriateness. A qualitative comparison with re-evaluated results from the previous study was made.
Results
The appropriateness was higher for MRI examinations than for CT examinations with procedures classed as usually appropriate for 76% and 63% of the examinations, respectively. The degree of appropriateness for CT was higher for referrals from hospitals compared to those from primary care centres. The opposite was found for MRI examinations. The alternative modalities that would result in higher appropriateness included all main imaging modalities. The result for CT did not show improvement compared with the former study.
Conclusions
A high proportion of both CT and MRI examinations were inappropriate. The study indicates that 37% of CT examinations and 24% of MRI examinations were inappropriate and that the appropriateness for CT has not improved in the last 15 years.
Critical relevance statement
A high proportion of CT and MRI examinations in this retrospective study using evidence-based referral guidelines were inappropriate.
Key points
∙ A high proportion of CT and MRI examinations were inappropriate.
∙ The CT referrals from general practitioners were less appropriate that those from hospital specialists.
∙ The MRI referrals from hospital specialists were less appropriate that those from general practitioners.
∙ Adherence to radiological appropriateness guidelines may improve the appropriateness of conducted examinations.
Graphical abstract
“…While responses may have been influenced by differing interpretations of the question posed as to whether this related to real life practice or to the existence of regulation, it is clear that advance justification is not practically being performed for all CT examinations, given the volume of respondents (47%) who reported that advance justification was performed sometimes or mostly. Although a minority (n = 4) stated that referrals were justified at the point of referral, this could have significant implications for the appropriate use of CT for patients without radiology practitioner oversight, particularly when one considers the poor usage and knowledge of referral guidelines already cited by others [6,8,10,17]. A previous HERCA report [18] following an inspection week in 2016 reported broadly similar findings, with as many as 26% of facilities not performing a satisfactory evaluation of the referral before the examinations were performed and even more not rejecting unjustified examinations (31%) or fully proving that the examinations were authorised by the radiological practitioner (35%).…”
Section: Discussionmentioning
confidence: 99%
“…Appropriate imaging referrals not only reduce the radiation exposure of the population, but importantly also save valuable healthcare resources. However, numerous publications point to a less than ideal level of justification in current practice with national audits reporting up to 39% [5][6][7] of CT examinations not being justified and even higher rates reported across smaller studies [8][9][10][11].…”
Objectives
Published literature on justification of computed tomography (CT) examinations in Europe is sparse but demonstrates consistent sub-optimal application. As part of the EU initiated CT justification project, this work set out to capture CT justification practices across Europe.
Methods
An electronic questionnaire consisting of mostly closed multiple-choice questions was distributed to national competent authorities and to presidents of European radiology societies in EU member states as well as Iceland, Norway, Switzerland, and the UK (n = 31).
Results
Fifty-one results were received from 30 European countries. Just 47% (n = 24) stated that advance justification of individual CT examinations is performed by a medical practitioner. Radiologists alone mostly (n = 27, 53%) perform daily justification of CT referrals although this is a shared responsibility in many countries. Imaging referral guidelines are widely available although just 13% (n = 6) consider them in daily use. Four countries (Cyprus, Ireland, Sweden, UK) reported having them embedded within clinical decision support systems. Justification of new practices with CT is mostly regulated (77%) although three countries (Belgium, Iceland and Portugal) reported not having any national system in place for generic justification. Health screening with CT was reported by seven countries as part of approved screening programmes and by eight countries outside. When performed, CT justification audits were reported to improve CT justification rates.
Conclusions
CT justification practices vary across Europe with less than 50% using advance justification and a minority having clinical decision support systems in place. CT for health screening purposes is not currently widely used in Europe.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.