Background and objectiveThe current periprosthetic joint infection (PJI) diagnostic guidelines require clinicians to interpret and integrate multiple criteria into a complex scoring system. Also, PJI classifications are often inconclusive, failing to provide a clinical diagnosis. Machine learning (ML) models could be leveraged to reduce reliance on these complex systems and thereby reduce diagnostic uncertainty. This study aimed to develop an ML algorithm using synovial fluid (SF) test results to establish a PJI probability score.
MethodsWe used a large clinical laboratory's dataset of SF samples, aspirated from patients with hip or knee arthroplasty as part of a PJI evaluation. Patient age and SF biomarkers [white blood cell count, neutrophil percentage (%PMN), red blood cell count, absorbance at 280 nm wavelength, C-reactive protein (CRP), alpha-defensin (AD), neutrophil elastase, and microbial antigen (MID) tests] were used for model development. Data preprocessing, principal component analysis, and unsupervised clustering (K-means) revealed four clusters of samples that naturally aggregated based on biomarker results. Analysis of the characteristics of each of these four clusters revealed three clusters (n=13,133) with samples having biomarker results typical of a PJI-negative classification and one cluster (n=4,032) with samples having biomarker results typical of a PJI-positive classification. A decision tree model, trained and tested independently of external diagnostic rules, was then developed to match the classification determined by the unsupervised clustering. The performance of the model was assessed versus a modified 2018 International Consensus Meeting (ICM) criteria, in both the test cohort and an independent unlabeled validation set of 5,601 samples. The SHAP (SHapley Additive exPlanations) method was used to explore feature importance.
ResultsThe ML model showed an area under the curve of 0.993, with a sensitivity of 98.8%, specificity of 97.3%, positive predictive value (PPV) of 92.9%, and negative predictive value (NPV) of 99.8% in predicting the modified 2018 ICM diagnosis among test set samples. The model maintained its diagnostic accuracy in the validation cohort, yielding 99.1% sensitivity, 97.1% specificity, 91.9% PPV, and 99.9% NPV. The model's inconclusive rate (diagnostic probability between 20-80%) in the validation cohort was only 1.3%, lower than that observed with the modified 2018 ICM PJI classification (7.4%; p<0.001).The SHAP analysis found that AD was the most important feature in the model, exhibiting dominance among >95% of "infected" and "not infected" diagnoses. Other important features were the sum of the MID test panel, %PMN, and SF-CRP.