Revisiting Deep Learning Models for Tabular Data

Yury, Gorishniy,; Ivan, Rubachev,; Khrulkov, Valentin; Babenko, Artem

doi:10.48550/arxiv.2106.11959

Cited by 22 publications

(51 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…4.2.1) and transformers-based (Sec. 4.2.2) groups exhibit superior predictive performance compared to plain deep neural networks on various data sets (Gorishniy et al, 2021;Ke et al, 2018Ke et al, , 2019Somepalli et al, 2021). This underlines the importance of special-purpose architectures for tabular data.…”

Section: Summary and Trendsmentioning

confidence: 96%

“…We also discuss the key categorical data encoding methods in Section 4.1.1. Gorishniy et al (2021) empirically evaluated a large number of state-of-the-art deep learning approaches for tabular data on a wide range of data sets. Interestingly, the authors demonstrated that a tuned deep neural network model with the ResNet-like architecture (He et al, 2016) shows comparable performance to some state-of-the-art deep learning approaches for tabular data.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Deep Neural Networks and Tabular Data: A Survey

Borisov¹,

Leemann²,

Seßler³

et al. 2021

Preprint

View full text Add to dashboard Cite

Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous data sets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their application to modeling tabular data (inference or generation) remains highly challenging. This work provides an overview of state-of-the-art deep learning methods for tabular data. We start by categorizing them into three groups: data transformations, specialized architectures, and regularization models. We then provide a comprehensive overview of the main approaches in each group. A discussion of deep learning approaches for generating tabular data is complemented by strategies for explaining deep models on tabular data. Our primary contribution is to address the main research streams and existing methodologies in this area, while highlighting relevant challenges and open research questions. To the best of our knowledge, this is the first in-depth look at deep learning approaches for tabular data. This work can serve as a valuable starting point and guide for researchers and practitioners interested in deep learning with tabular data.

show abstract

Section: Summary and Trendsmentioning

confidence: 96%

Section: Related Workmentioning

confidence: 99%

Deep Neural Networks and Tabular Data: A Survey

Borisov¹,

Leemann²,

Seßler³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…After searching reports about his activities during that period, we find that he was racing cars for Porsche, 8 and attending fashion shows as ambassador of Louis Vuitton. 9 Apart from "Wu Yifan", we also observe topics related to "live stream", which is also a new topic that has not been observed in previous analysis. ah ah ah (0.045), like (0.041), thanks (0.035), mom (0.019), new (0.015), teacher (0.014), cry (0.011), shoot (0.010), character (0.009), great (0.008), rush (0.008), baby (0.008), song (0.008), clarify (0.008), donate (0.008), live (0.007), hope (0.007), become (0.007), Beijing (0.007), dry (0.007) 9 cute (0.029), china (0.025), support (0.020), Zhang Zhehan (0.020), child (0.014), apologize (0.013), woman (0.012), nation (0.010), gong jun (0.009), stand (0.008), friend (0.008), engage in (0.008), write (0.008), society (0.007), law (0.007), girl (0.006), chance (0.006), sad (0.006), long (0.005), willing (0.005) 10 Wu Yifan (0.126), Mr (0.048), endorsement (0.028), spokesperson (0.026), road (0.023), expect (0.022), easy Vuitton (0.020), brand (0.019), force (0.016), worldwide (0.016), music (0.014), nice (0.011), cattle (0.009), silly (0.009), high (0.008), congratulations (0.008), Wu (0.007), Li Shubai (0.007), racer (0.006), wish (0.006)…”

Section: Temporal Analysismentioning

confidence: 49%

“…To measure the influence, we construct a decision tree-based model and analyze the feature importance. In a recent study of Gorishniy et al [9], it is found that for heterogeneous data, the baseline performance of GBDT (Gradient Boost Decision Tree) is strictly superior to DNN (Deep Neural Networks). In addition, for decision tree-based models, they are intrinsically more interpretable than deep neural networks.…”

Section: Feature Importance Analysis 51 Model Selectionmentioning

confidence: 99%

See 1 more Smart Citation

Look behind the Censorship: Reposting-User Characterization and Muted-Topic Restoration

Qian¹,

Shan²,

Lyu³

et al. 2021

Preprint

View full text Add to dashboard Cite

The emergence of social media has largely eased the way people receive information and participate in public discussions. However, in countries with strict regulations on discussions in the public space, social media is no exception. To limit the degree of dissent or inhibit the spread of "harmful" information, a common approach is to impose censorship on social media. In this paper, we focus on a study of censorship on Weibo, the counterpart of Twitter in China. Specifically, we 1) create a web-scraping pipeline and collect a large dataset solely focus on the reposts from Weibo; 2) discover the characteristics of users whose reposts contain censored information, in terms of gender, location, device, and account type; and 3) conduct a thematic analysis by extracting and analyzing topic information. Note that although the original posts are no longer visible, we can use comments user wrote when reposting the original post to infer the topic of the original content. We find that such efforts can recover the discussions around social events that triggered massive discussions but were later muted. Further, we show the variations of inferred topics across different user groups and time frames. CCS CONCEPTS• Social and professional topics → User characteristics; • Applied computing → Sociology.

show abstract