Towards Class-Balancing Human Comfort Datasets with GANs

Quintana, Matias; Miller, Clayton

doi:10.1145/3360322.3361016

Cited by 14 publications

(9 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…rebalancing (Engelmann and Lessmann, 2021;Quintana and Miller, 2019;Koivu et al, 2020;Darabi and Elor, 2021) in particular. Another highly relevant topic is privacy-aware machine learning (Choi et al, 2017;Fan et al, 2020;Kamthe et al, 2021) where generated data can be used to overcome privacy concerns.…”

Section: Tabular Data Generationmentioning

confidence: 99%

Deep Neural Networks and Tabular Data: A Survey

Borisov¹,

Leemann²,

Seßler³

et al. 2021

Preprint

View full text Add to dashboard Cite

Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous data sets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their application to modeling tabular data (inference or generation) remains highly challenging. This work provides an overview of state-of-the-art deep learning methods for tabular data. We start by categorizing them into three groups: data transformations, specialized architectures, and regularization models. We then provide a comprehensive overview of the main approaches in each group. A discussion of deep learning approaches for generating tabular data is complemented by strategies for explaining deep models on tabular data. Our primary contribution is to address the main research streams and existing methodologies in this area, while highlighting relevant challenges and open research questions. To the best of our knowledge, this is the first in-depth look at deep learning approaches for tabular data. This work can serve as a valuable starting point and guide for researchers and practitioners interested in deep learning with tabular data.

show abstract

Section: Tabular Data Generationmentioning

confidence: 99%

Deep Neural Networks and Tabular Data: A Survey

Borisov¹,

Leemann²,

Seßler³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Recently, Engelmann and Lessmann [54] also examined the ability of GANs to generate data in a structured (tabular) rather than unstructured (image) context, specifically in the field of credit scoring. Like Quintana and Miller [55], these researchers sought to generate and use both continuous and categorical explanatory variables. The authors opted for a Wasserstein GAN [56] architecture, with adjustments such as using the Gumbel-softmax activation function [57] in combination with embedding layers [58] to model discrete numerical variables, and min-max scaling paired with the addition of Gaussian noise data to avoid Discriminator detection of a trivial pattern ("number of loyalty points", for example, which in the real dataset only appears in increments of ten).…”

Section: Financial Transactionsmentioning

confidence: 99%

“…Another mostly unexplored application of GANs in imbalanced data settings in the realm of tabular datasets comes relating to human sentiment. Quintana and Miller [55] attempt to remedy class imbalance in a human comfort dataset [62], a dataset inquiring of participants the satisfaction with their living environments which contains a sizeable majority of "0" (neutral) labels. The research duo examines the performance of the Tabular-GAN framework developed by Xu and Veeramachaneni [63], as well as no treatment, GANCorr, and a basic GAN as baselines, all with a 70-30 train-test split and with KNN, Naïve Bayesian, and SVM learners.…”

Section: Other Disciplinesmentioning

confidence: 99%

The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey

Sauber-Cole

Khoshgoftaar

2022

J Big Data

View full text Add to dashboard Cite

The existence of class imbalance in a dataset can greatly bias the classifier towards majority classification. This discrepancy can pose a serious problem for deep learning models, which require copious and diverse amounts of data to learn patterns and output classifications. Traditionally, data-level and algorithm-level techniques have been instrumental in mitigating the adverse effect of class imbalance. With the recent development and proliferation of Generative Adversarial Networks (GANs), researchers across a variety of disciplines have adapted the architecture of GANs and implemented them on imbalanced datasets to generate instances of the underrepresented class(es). Though the bulk of research has been centered on the application of this methodology in computer vision tasks, GANs are likewise being appropriated for use in tabular data, or data consisting of rows and columns with traditional structured data types. In this survey paper, we assess the methodology and efficacy of these modifications on tabular datasets, across domains such network traffic classification and financial transactions over the past seven years. We examine what methodologies and experimental factors have resulted in the greatest machine learning efficacy, as well as the research works and frameworks which have proven most influential in the development of the application of GANs in tabular data settings. Specifically, we note the prevalence of the CGAN architecture, the optimality of novel methods with CNN learners and minority-class sensitive measures such as F1 score, the popularity of SMOTE as a baseline technique, and the improved performance in the year-over-year use of GANs in imbalanced tabular datasets.

show abstract

“…To bridge the gap of generative methods for imbalanced and numerical thermal comfort datasets, and building on previous work [41] , we propose comfortGAN, a conditional Wasserstein GAN with gradient penalty (cWGAN-GP) as a class balancing algorithm for data-driven thermal comfort modeling instead of commonly used methods. We assessed the performance of a balanced thermal comfort dataset, composed of generated and real samples, on a multi-class classification model, on scenarios where comfort feedback can take as much as seven distinct values, as well as a reduced version with only three possible values.…”

Section: Related Work and Noveltymentioning

confidence: 99%

“…Subsequent modifications on WGAN, known as WGAN-gradient penalty (WGAN-GP) [21], enhances training stability and have shown better results and convergence in practice compared to conventionally used image-based GAN variants (e.g., convolutional GANs), specifically on tabular data from other fields [46,48]. Therefore, we move away from the vanilla architecture used in [41] and on this work, we use the WGAN-GP loss variant for comfortGAN.…”

Section: Customized Gan For Thermal Comfortmentioning

confidence: 99%

Balancing thermal comfort datasets: We GAN, but should we?

Quintana,

Schiavon,

Tham

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Thermal comfort assessment for the built environment has become more available to analysts and researchers due to the proliferation of sensors and subjective feedback methods. These data can be used for modeling comfort behavior to support design and operations towards energy efficiency and well-being. By nature, occupant subjective feedback is imbalanced as indoor conditions are designed for comfort, and responses indicating otherwise are less common. This situation creates a scenario for the machine learning workflow where class balancing as a pre-processing step might be valuable for developing predictive thermal comfort classification models with high-performance. This paper investigates the various thermal comfort dataset class balancing techniques from the literature and proposes a modified conditional Generative Adversarial Network (GAN), comfortGAN, to address this imbalance scenario. These approaches are applied to three publicly available datasets, ranging from 30 and 67 participants to a global collection of thermal comfort datasets, with 1,474; 2,067; and 66,397 data points, respectively. This work finds that a classification model trained on a balanced dataset, comprised of real and generated samples from comfortGAN, has higher performance (increase between 4% and 17% in classification accuracy) than other augmentation methods tested. However, when classes representing discomfort are merged and reduced to three, better imbalanced performance is expected, and the additional increase in performance by comfortGAN shrinks to 1-2%. These results illustrate that class balancing for thermal comfort modeling is beneficial using advanced techniques such as GANs, but its value is diminished in certain scenarios. A discussion is provided to assist potential users in determining which scenarios this process is useful and which method works best.

show abstract

Towards Class-Balancing Human Comfort Datasets with GANs

Cited by 14 publications

References 6 publications

Deep Neural Networks and Tabular Data: A Survey

Deep Neural Networks and Tabular Data: A Survey

The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey

Balancing thermal comfort datasets: We GAN, but should we?

Contact Info

Product

Resources

About