INTRODUCTION AND OBJECTIVE: Small renal masses (SRM) represent a diagnostic challenge despite advances in imaging and renal biopsy. Approximately 20% of SRM removed at surgery are found to be benign post-operatively. DNA methylation is an early and stable event in tumorigenesis, which reveals cell ontology. We developed and validated a machine learning model, leveraging DNA methylation analysis in >1500 patient samples, to classify pathological subtypes of renal tumours and provide a platform to improve diagnosis.METHODS: Methylation was evaluated in gDNA from fresh frozen nephrectomy specimens, using the Illumina 450k microarray and EPIC Methyl Capture sequencing platforms. We combined our own dataset and TCGA data (N[approx 158000 markers). This dataset was divided into training and testing sets, using 4-fold cross validation, and used to train an XGBoost model to classify samples into one of five pathological subtypes. External validation was carried out on two independent publicly available datasets, from fresh frozen nephrectomy tissue (Brennan et al) and ex-vivo tissue biopsies (Chopra et al).RESULTS: The integrated training and testing set consisted of 1268 samples, including: 465 ccRCC, 303 pRCC, 101 chRCC, 73 oncocytomas and 329 adjacent normal kidney. The prediction accuracy in the testing set was 0.954; with high class-wise Area Under the Receiver Operating Characteristic Curves (ROC AUCs): 0.994 (ccRCC), 0.995 (pRCC), 0.982 (chRCC), 0.999 (oncocytoma), 1 (adjacent normal). The model was externally validated on 245 ex-vivo tissue biopsies from 100 renal tumours. The accuracy was 0.824 and average class-wise ROC AUCs were: 0.978 (ccRCC), 0.946 (pRCC), 0.994 (chRCC), 0.892 (oncocytoma), 0.969 (adjacent normal). Furthermore, an independent external validation was performed on 34 tissue samples, achieving average class-wise ROC AUCs: 1 (ccRCC), 0.904 (chRCC), 0.946 (oncocytoma), 0.976 (adjacent normal). In order to assess the impact of methylation heterogeneity on our classifier, the model was evaluated in 97 multi-region samples from 18 ccRCC patients.CONCLUSIONS: A machine learning model based on DNA methylation data can differentiate pathological subtypes of renal tumours with excellent accuracy, including external validation. The model may be applied to renal biopsy samples to improve the diagnostic pathway for renal tumours. This has the potential to reduce overtreatment by preventing unnecessary surgery, enhance patient QoL and minimize healthcare costs.