“…Although numerous models have been proposed based on using differentiable ensembles 45,46,47,48,49 , leveraging attention-based transformer neural networks 35,50,51,52,53,54 , as well as other approaches 55,56,57,58,59,60 , recent work on systematic evaluation of deep tabular models 35,44 shows that there is no universally best model capable of consistently outperforming GBDT. Transformer-based models have been shown to be the strongest competitor of GBDT 35,50,54,61,62 , especially when coupled with a powerful hyperparameter tuning toolkit 35,63 .…”