Language Representation Models: An Overview

Schomacker, Thorben; Tropmann-Frick, Marina

doi:10.3390/e23111422

Cited by 14 publications

(8 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These acquired representations can subsequently be adapted for specific downstream tasks, including sentiment analysis, named entity recognition, and machine translation. [46,47,48,49] The success of generative pretraining in natural language processing can be attributed to its ability to learn rich and meaningful representations of language that can be used for a variety of downstream tasks.…”

Section: Generative Pretraining For Embedding Generationmentioning

confidence: 99%

Drug-Target-Interaction Prediction with Contrastive and Siamese Transformers

Ikechukwu,

Kumar

2023

Preprint

View full text Add to dashboard Cite

As machine learning (ML) becomes increasingly integrated into the drug development process, accurately predicting Drug-Target Interactions (DTI) becomes a necessity for pharmaceutical research. This prediction plays a crucial role in various aspects of drug development, including virtual screening, repurposing of drugs, and proactively identifying potential side effects. While Deep Learning has made significant progress in enhancing DTI prediction, challenges related to interpretability and consistent performance persist in the field. This study introduces two innovative methodologies that combine Generative Pretraining and Contrastive Learning to specialize Transformers for bio-chemical modeling. These systems are designed to best incorporate cross-attention, which enables a nuanced alignment of multi-representation embeddings. Our empirical evaluation will showcase the effectiveness and interpretability of this proposed framework. Through a series of experiments, we provide compelling evidence of its superior predictive accuracy and enhanced interpretability. The primary objective of this research is not only to contribute to the advancement of novel DTI prediction methods but also to promote greater transparency and reliability within the drug discovery pipeline.

show abstract

Section: Generative Pretraining For Embedding Generationmentioning

confidence: 99%

Drug-Target-Interaction Prediction with Contrastive and Siamese Transformers

Ikechukwu,

Kumar

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Pre-trained language models, such as BERT [ 23 ], BERT-WWM [ 24 ], RoBERTa [ 25 ], and NEZHA [ 26 ], have gradually become a fundamental technique for NLP, with great success on both English and Chinese tasks [ 27 ]. In our approach, we use the BERT and NEZHA feature extraction layers.…”

Section: Related Workmentioning

confidence: 99%

Improving Automated Essay Scoring by Prompt Prediction and Matching

Sun

Song

et al. 2022

Entropy

View full text Add to dashboard Cite

Automated essay scoring aims to evaluate the quality of an essay automatically. It is one of the main educational application in the field of natural language processing. Recently, Pre-training techniques have been used to improve performance on downstream tasks, and many studies have attempted to use pre-training and then fine-tuning mechanisms in an essay scoring system. However, obtaining better features such as prompts by the pre-trained encoder is critical but not fully studied. In this paper, we create a prompt feature fusion method that is better suited for fine-tuning. Besides, we use multi-task learning by designing two auxiliary tasks, prompt prediction and prompt matching, to obtain better features. The experimental results show that both auxiliary tasks can improve model performance, and the combination of the two auxiliary tasks with the NEZHA pre-trained encoder produces the best results, with Quadratic Weighted Kappa improving 2.5% and Pearson’s Correlation Coefficient improving 2% on average across all results on the HSK dataset.

show abstract

“…We adopt mini-bach mechanism to train our model. As seen in the two pictures above, in Figure 4, we have conducted many experiments by setting the batch size to [4,8,16,32,64], and, finally, we have chosen a better batch size of 8. As for the learning rate, we found that it is a better choice to choose different learning rates for different parameters through experiments.…”

Section: Implementation Detailsmentioning

confidence: 99%

“…Using a representation language model pre-trained from large-scale unlabeled text is a universal and effective method in most natural language understanding tasks [ 16 ]. Most of these language models use self-supervised training methods.…”

Section: Related Workmentioning

confidence: 99%

A Word-Granular Adversarial Attacks Framework for Causal Event Extraction

Zhao

Zuo

Liang

et al. 2022

Entropy

View full text Add to dashboard Cite

As a data augmentation method, masking word is commonly used in many natural language processing tasks. However, most mask methods are based on rules and are not related to downstream tasks. In this paper, we propose a novel masking word generator, named Actor-Critic Mask Model (ACMM), which can adaptively adjust the mask strategy according to the performance of downstream tasks. In order to demonstrate the effectiveness of the method, we conducted experiments on two causal event extraction datasets. Experiment results show that, compared with various rule-based masking methods, the masked sentences generated by our proposed method can significantly enhance the generalization of the model and improve the model performance.

show abstract

Language Representation Models: An Overview

Cited by 14 publications

References 31 publications

Drug-Target-Interaction Prediction with Contrastive and Siamese Transformers

Drug-Target-Interaction Prediction with Contrastive and Siamese Transformers

Improving Automated Essay Scoring by Prompt Prediction and Matching

A Word-Granular Adversarial Attacks Framework for Causal Event Extraction

Contact Info

Product

Resources

About