BACKGROUND
Tools that can help to identify preoperative patients in need of further cardiovascular testing or consultation may be of use in reducing costs and ensuring rational utilization of resources.
OBJECTIVE
We evaluate the feasibility of utilizing general purpose versus domain-specific large language models (LLM) for a classification task aimed at identifying these surgical patients.
METHODS
The objective of this study was to leverage various LLMs to classify patients that would need preoperative cardiac evaluation based on their preoperative clinical notes. General-purpose (BERT, RoBERTa, Longformer) and domain-specific (BioClinicalBERT, PubMedBERT) were used to train on this classification task. Performance was validated on the test set and the area under the receiver operating characteristics curve (AUC), F1-score, sensitivity, specificity, precision, and recall were measured.
RESULTS
There were 175 patients, in which 67 (38.2%) patients were determined to require preoperative cardiac evaluation/testing. The dataset was divided into a training and test set, which consisted of 75% (n=131) and 25% (n=44) of the dataset. All models performed similarly, in which the AUC was highest with Longformer (0.90) and the Precision-Recall score was highest with PubMedBERT (0.88).
CONCLUSIONS
This study described the use of three general purpose and two domain-specific LLMs to classify surgical patients in need of preoperative cardiovascular workup. All LLMs had excellent yet similar performance. LLMs may be leveraged on preoperative clinical notes to classify which patients would benefit from preoperative cardiology evaluations. No clinically significant differences were seen between domain-specific and general-purpose LLMs.