Transformers for Tabular Data Representation: A Survey of Models and Applications

Badaro, Gilbert; Saeed, Mohammed; Papotti, Paolo

doi:10.1162/tacl_a_00544

Cited by 15 publications

(4 citation statements)

References 69 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More generally, limited data availability constraints leverage more advanced deep learning architectures in rational nanozyme design, while there are many examples of their successful applications. 56,57 Another challenge is data preprocessing, which has only been possible so far with rigorous analysis of scienti c publications by domain experts and manual data extraction. With the appearance of LLMs now available on a commercial (e.g., GPT-4 34 ) but also on an opensource (e.g., Llama-2 58 ) basis, we are presented with an opportunity to automate or largely simplify this process.…”

Section: Prediction Of Multiple Catalytic Activitiesmentioning

confidence: 99%

DiZyme: The Ultimate Resource for Nanozyme Multiple Catalytic Activity Prediction

Vinogradov,

Razlivina,

Dmitrenko

2023

Preprint

View full text Add to dashboard Cite

Nanozymes are unique materials with many valuable properties for applications in biomedicine, biosensing, clinical diagnostics, environmental monitoring and beyond. However, it is usually challenging to find new nanozymes. In this work, we developed a machine learning (ML) approach to search for new nanozymes and deployed a web platform DiZyme, featuring a unique database of nanozymes, catalytic activity prediction, data visualization and DiZyme Assistant interface powered by a large language model (LLM). For the first time, we enable prediction of multiple catalytic activities of nanozymes by training an ensemble learning algorithm achieving R2 = 0.75 for the Michaelis-Menten constant and R2 = 0.77 for the maximum velocity. We envision accurate prediction of multi-catalytic activity (peroxidase, oxidase, and catalase) promoting entirely novel applications for a wide range of surface-modified inorganic nanozyme compositions. The DiZyme state-of-the-art database of nanozymes contains 1210 experimental samples with a wide range of compositions and molecular features. The DiZyme Assistant based on the ChatGPT model provides users with supporting information on experimental samples, such as synthesis procedures, measurement protocols, etc. DiZyme (dizyme.aicidlab.itmo.ru) is now openly available worldwide.

show abstract

Section: Prediction Of Multiple Catalytic Activitiesmentioning

confidence: 99%

DiZyme: The Ultimate Resource for Nanozyme Multiple Catalytic Activity Prediction

Vinogradov,

Razlivina,

Dmitrenko

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…However, DLbased methods are primarily adapted to transfer learning because they are easier to pre-train than tree-based methods. Among others, self-supervised learning with Transformers is the most common pre-training approach of DL-based methods (see, e.g., Badaro et al (2023) for the survey).…”

Section: Related Workmentioning

confidence: 99%

TabRet: Pre-training Transformer-based Tabular Models for Unseen Columns

Onishi¹,

Oono²,

Hayashi³

2023

Preprint

View full text Add to dashboard Cite

We present TabRet, a pre-trainable Transformer-based model for tabular data. TabRet is designed to work on a downstream task that contains columns not seen in pre-training. Unlike other methods, TabRet has an extra learning step before fine-tuning called retokenizing, which calibrates feature embeddings based on the masked autoencoding loss. In experiments, we pre-trained TabRet with a large collection of public health surveys and fine-tuned it on classification tasks in healthcare, and TabRet achieved the best AUC performance on four datasets. In addition, an ablation study shows retokenizing and random shuffle augmentation of columns during pre-training contributed to performance gains. The code is available at https://github.com/pfnet-research/tabret.

show abstract

“…Transformer [17] is an attention-based structure originally proposed as a sequence-tosequence model for machine translation tasks. In recent years, by virtue of its outstanding results in the field of Natural Language Processing (NLP) [18][19][20][21], it has attracted a wide range of attention from researchers in the field of computer vision [22], and more and more researchers are migrating its application to computer vision tasks such as target detection, video processing, image processing.…”

Section: Transformermentioning

confidence: 99%

A Feature-Oriented Reconstruction Method for Surface-Defect Detection on Aluminum Profiles

Tang,

Zhang,

Jin

et al. 2023

Applied Sciences

View full text Add to dashboard Cite

The number of defect samples on the surface of aluminum profiles is small, and the distribution of abnormal visual features is dispersed, such that the existing supervised detection methods cannot effectively detect undefined defects. At the same time, the normal texture of the aluminum profile surface presents non-uniform and non-periodic features, and this irregular distribution makes it difficult for classical reconstruction networks to accurately reconstruct the normal features, resulting in low performance of related unsupervised detection methods. Aiming at such problems, a feature-oriented reconstruction method of unsupervised surface-defect detection method for aluminum profiles is proposed. The aluminum profile image preprocessing stage uses techniques such as boundary extraction, background removal, and data normalization to process the original image and extract the image of the main part of the aluminum profile, which reduces the influence of irrelevant data features on the algorithm. The essential features learning stage precedes the feature-optimization module to eliminate the texture interference of the irregular distribution of the aluminum profile surface, and image blocks of the area images are reconstructed one by one to extract the features through the mask. The defect-detection stage compares the structural similarity of the feature images before and after the reconstruction, and comprehensively determines the detection results. The experimental results improve detection precision by 1.4% and the F1 value by 1.2% over the existing unsupervised methods, proving the effectiveness and superiority of the proposed method.

show abstract

Transformers for Tabular Data Representation: A Survey of Models and Applications

Cited by 15 publications

References 69 publications

DiZyme: The Ultimate Resource for Nanozyme Multiple Catalytic Activity Prediction

DiZyme: The Ultimate Resource for Nanozyme Multiple Catalytic Activity Prediction

TabRet: Pre-training Transformer-based Tabular Models for Unseen Columns

A Feature-Oriented Reconstruction Method for Surface-Defect Detection on Aluminum Profiles

Contact Info

Product

Resources

About