On Embeddings for Numerical Features in Tabular Deep Learning

Yury, Gorishniy,; Ivan, Rubachev,; Babenko, Artem

doi:10.48550/arxiv.2203.05556

Cited by 9 publications

(14 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note also that our observation could also explain the benefits of the ExU activation used in the Neural-GAM paper [Agarwal et al, 2021], and of the embeddings used in Gorishniy et al [2022]: the periodic embedding might help the model to learn the high-frequency part of the target function, and the target-aware binning might make the target function smoother.…”

Section: Resultsmentioning

confidence: 70%

See 1 more Smart Citation

Why do tree-based models still outperform deep learning on tabular data?

Grinsztajn¹,

Oyallon²,

Varoquaux³

2022

Preprint

106

View full text Add to dashboard Cite

While deep learning has enabled tremendous progress on text and image datasets, its superiority on tabular data is not clear. We contribute extensive benchmarks of standard and novel deep learning methods as well as tree-based models such as XGBoost and Random Forests, across a large number of datasets and hyperparameter combinations. We define a standard set of 45 datasets from varied domains with clear characteristics of tabular data and a benchmarking methodology accounting for both fitting models and finding good hyperparameters. Results show that treebased models remain state-of-the-art on medium-sized data (∼10K samples) even without accounting for their superior speed. To understand this gap, we conduct an empirical investigation into the differing inductive biases of tree-based models and Neural Networks (NNs). This leads to a series of challenges which should guide researchers aiming to build tabular-specific NNs: 1. be robust to uninformative features, 2. preserve the orientation of the data, and 3. be able to easily learn irregular functions. To stimulate research on tabular architectures, we contribute a standard benchmark and raw data for baselines: every point of a 20 000 compute hours hyperparameter search for each learner.Preprint. Under review.

show abstract

Section: Resultsmentioning

confidence: 70%

“…Our findings shed light on the results of Somepalli et al [2021] and Gorishniy et al [2022], which add an embedding layer, even for numerical features, before MLP or Transformer models. Indeed, this layer breaks rotation invariance.…”

Section: Finding 2: Uninformative Features Affect More Mlp-like Nnsmentioning

confidence: 70%

Why do tree-based models still outperform deep learning on tabular data?

Grinsztajn¹,

Oyallon²,

Varoquaux³

2022

Preprint

106

View full text Add to dashboard Cite

show abstract

“…Although numerous models have been proposed based on using differentiable ensembles 45,46,47,48,49 , leveraging attention-based transformer neural networks 35,50,51,52,53,54 , as well as other approaches 55,56,57,58,59,60 , recent work on systematic evaluation of deep tabular models 35,44 shows that there is no universally best model capable of consistently outperforming GBDT. Transformer-based models have been shown to be the strongest competitor of GBDT 35,50,54,61,62 , especially when coupled with a powerful hyperparameter tuning toolkit 35,63 .…”

Section: Methodsmentioning

confidence: 99%

“…Tabular transformer model. We employ the recent transformer-based tabular deep learning method FT-Transformer proposed by Gorishniy et al 35 which has been shown to be the strongest neural network approach in the tabular data domain 35,61 . Additionally, we compare the performance of our model with the gradient boosted decision trees, and we use the popular CatBoost 36 and XGBoost 37 packages.…”

Section: Iib Transformer-based Tabular Deep Learning Modelmentioning

confidence: 99%

Patient-specific Quality Assurance Failure Prediction with Deep Tabular Models

Levin

Aravkin

Kim

2022

Preprint

View full text Add to dashboard Cite

Background: Patient-specific quality assurance (PSQA) is part of the standard practice to ensure that a patient receives the dose from intensity-modulated radiotherapy (IMRT) beams as planned in the treatment planning system (TPS). PSQA failures can cause a delay in patient care and increase workload and stress of staff members. A large body of previous work for PSQA failure prediction focuses on non-learned plan complexity measures. Another prominent line of work uses machine learning methods, often in conjunction with feature engineering. Currently, there are no machine learning solutions which work directly with multi-leaf collimator (MLC) leaf positions, providing an opportunity to improve leaf sequencing algorithms using these techniques. Purpose: To improve patient safety and work efficiency, we develop a tabular transformer model based directly on the MLC leaf positions (without any feature engineering) to predict IMRT PSQA failure. This neural model provides an end-to-end differentiable map from MLC leaf positions to the probability of PSQA plan failure, which could be useful for regularizing gradient-based leaf sequencing optimization algorithms and generating a plan that is more likely to pass PSQA. Method: We retrospectively collected DICOM RT PLAN files of 968 patient plans treated with volumetric arc therapy. We construct a beam-level tabular dataset with 1873 beams as samples and MLC leaf positions as features. We train an attention-based neural network FT-Transformer to predict the ArcCheck-based PSQA gamma pass rates. In addition to the regression task, we evaluate the model in the binary classification context predicting the pass or fail of PSQA. The performance was compared to the results of the two leading tree ensemble methods (CatBoost and XGBoost) and a non-learned method based on mean MLC gap. Results: The FT-Transformer model achieves 1.44% Mean Absolute Error (MAE) in the regression task of the gamma pass rate prediction and performs on par with XGBoost (1.53 % MAE) and CatBoost (1.40 % MAE). In the binary classification task of PSQA failure prediction, FT-Transformer achieves 0.85 ROC AUC (with CatBoost and XGBoost achieving 0.87 ROC AUC and the mean-MLC-gap complexity metric achieving 0.72 ROC AUC). %While AUC summarizes the performance of binary classification models for all prediction thresholds, the performance at specific thresholds that maximize the true positive rate of PSQA failure identification while maintaining low false positive rate is of interest for clinical applications. Moreover, FT-Transformer, CatBoost, and XGBoost all achieve 80\% true positive rate while keeping the false positive rate under 20%. Conclusions: We demonstrate that reliable PSQA failure predictors can be successfully developed based solely on MLC leaf positions. Our FT-Transformer neural network can reduce the need for patient rescheduling due to PSQA failures by 80% while sending only 20% of plans that would not have failed the PSQA for replanning. FT-Transformer achieves comparable performance with the leading tree ensemble methods while having an additional benefit of providing an end-to-end differentiable map from MLC leaf positions to the probability of PSQA failure.

show abstract

“…As reported in [10], an efficient transformation of categorical data for training DNNs is still a significant challenge. Furthermore, a work [11] shows that the embeddings (transformations) for numerical features can be also beneficial for DNNs.…”

Section: Introductionmentioning

confidence: 99%

DeepTLF: robust deep neural networks for heterogeneous tabular data

Borisov

Broelemann²,

Kasneci

et al. 2022

Int J Data Sci Anal

View full text Add to dashboard Cite

Although deep neural networks (DNNs) constitute the state of the art in many tasks based on visual, audio, or text data, their performance on heterogeneous, tabular data is typically inferior to that of decision tree ensembles. To bridge the gap between the difficulty of DNNs to handle tabular data and leverage the flexibility of deep learning under input heterogeneity, we propose DeepTLF, a framework for deep tabular learning. The core idea of our method is to transform the heterogeneous input data into homogeneous data to boost the performance of DNNs considerably. For the transformation step, we develop a novel knowledge distillations approach, TreeDrivenEncoder, which exploits the structure of decision trees trained on the available heterogeneous data to map the original input vectors onto homogeneous vectors that a DNN can use to improve the predictive performance. Within the proposed framework, we also address the issue of the multimodal learning, since it is challenging to apply decision tree ensemble methods when other data modalities are present. Through extensive and challenging experiments on various real-world datasets, we demonstrate that the DeepTLF pipeline leads to higher predictive performance. On average, our framework shows 19.6% performance improvement in comparison to DNNs. The DeepTLF code is publicly available.

show abstract

On Embeddings for Numerical Features in Tabular Deep Learning

Cited by 9 publications

References 0 publications

Why do tree-based models still outperform deep learning on tabular data?

Why do tree-based models still outperform deep learning on tabular data?

Patient-specific Quality Assurance Failure Prediction with Deep Tabular Models

DeepTLF: robust deep neural networks for heterogeneous tabular data

Contact Info

Product

Resources

About