2021
DOI: 10.48550/arxiv.2108.12296
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Contrastive Mixup: Self- and Semi-Supervised learning for Tabular Domain

Abstract: Recent literature in self-supervised has demonstrated significant progress in closing the gap between supervised and unsupervised methods in the image and text domains. These methods rely on domain-specific augmentations that are not directly amenable to the tabular domain. Instead, we introduce Contrastive Mixup, a semisupervised learning framework for tabular data and demonstrate its effectiveness in limited annotated data settings. Our proposed method leverages Mixup-based augmentation under the manifold as… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 16 publications
0
3
0
Order By: Relevance
“…Self-supervised VPCL. Most SSL tabular methods work on the whole fixed set of columns [2,24,11], which take high computational costs and are prone to overfitting. Instead, we take tabular vertical partitions to build positive and negative samples for CL under the hypothesis that the powerful representation should model view-invariant factors.…”
Section: Self-supervised and Supervised Pretraining Of Transtabmentioning
confidence: 99%
See 1 more Smart Citation
“…Self-supervised VPCL. Most SSL tabular methods work on the whole fixed set of columns [2,24,11], which take high computational costs and are prone to overfitting. Instead, we take tabular vertical partitions to build positive and negative samples for CL under the hypothesis that the powerful representation should model view-invariant factors.…”
Section: Self-supervised and Supervised Pretraining Of Transtabmentioning
confidence: 99%
“…However, it was argued that boosting algorithms and MLPs are still the competitive choices for tabular data modeling, especially when the sample size is small [32,46,39,47]. To alleviate label scarcity issue, SSL pretraining on unlabeled tabular data was introduced [2,24,10,9,11]. Nonetheless, none of them is transferable across tables then is able to extend the success of pretraining to the tabular domain.…”
Section: Related Workmentioning
confidence: 99%
“…Lack of clear feature relationships in tabular data, fully connected dense neural networks are typically used as a parametric method for training to consider the impact of all features on the target values in supervised setting [8,12]. Some methods have been proposed to enable deep feature learning in contrastive learning paradigm for tabular data, however, they all use dense layer network [2,5,14,21]. The main drawback of dense layer is that they learn global patterns using all features.…”
Section: Introductionmentioning
confidence: 99%