Multi-Task Learning in Natural Language Processing: An Overview

Chen, Shijie; Qiang, Yang

doi:10.48550/arxiv.2109.09138

Cited by 10 publications

(13 citation statements)

References 132 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Section: Details Of Mtl Architecturementioning

confidence: 99%

“…Generally, an MTL model can be trained by linearly combining loss functions from different tasks into a single total loss function [15]. In this way, the model can learn a shared representation for all tasks by stochastic gradient descent (SGD) with back-propagation [15,43]. Ordinarily, assuming that there are M tasks in all, the global loss function can be defined as where L i represents task-specific loss function, and w i denotes weights assigned for each L i .…”

Section: Details Of Mtl Architecturementioning

confidence: 99%

“…Multi-task learning (MTL), which is able to leverage useful information of related tasks to achieve simultaneous strong performance on multiple associated tasks [15], has led to great success in many machine learning applications like NLP [15,16]. As for the protein sequence domain, MTL has been widely applied for functional studies, like protein-protein interaction and protein targets [17][18][19][20].…”

mentioning

confidence: 99%

See 2 more Smart Citations

Collectively encoding protein properties enriches protein language models

Weng

2022

BMC Bioinformatics

View full text Add to dashboard Cite

Pre-trained natural language processing models on a large natural language corpus can naturally transfer learned knowledge to protein domains by fine-tuning specific in-domain tasks. However, few studies focused on enriching such protein language models by jointly learning protein properties from strongly-correlated protein tasks. Here we elaborately designed a multi-task learning (MTL) architecture, aiming to decipher implicit structural and evolutionary information from three sequence-level classification tasks for protein family, superfamily and fold. Considering the co-existing contextual relevance between human words and protein language, we employed BERT, pre-trained on a large natural language corpus, as our backbone to handle protein sequences. More importantly, the encoded knowledge obtained in the MTL stage can be well transferred to more fine-grained downstream tasks of TAPE. Experiments on structure- or evolution-related applications demonstrate that our approach outperforms many state-of-the-art Transformer-based protein models, especially in remote homology detection.

show abstract

Section: Details Of Mtl Architecturementioning

confidence: 99%

Section: Details Of Mtl Architecturementioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Collectively encoding protein properties enriches protein language models

Weng

2022

BMC Bioinformatics

View full text Add to dashboard Cite

show abstract

“…Spinde et al [41] train DistilBERT [34] on combinations of biasrelated datasets using a Multi-task Learning (MTL) [6,54] approach. Their best-performing MTL model achieves 0.776 F1 score on a subset of BABE.…”

Section: Transformer-based Detection Approachesmentioning

confidence: 99%

A domain-adaptive pre-training approach for language bias detection in news

Krieger

Spinde

Ruas

et al. 2022

Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries

View full text Add to dashboard Cite

Media bias is a multi-faceted construct influencing individual behavior and collective decision-making. Slanted news reporting is the result of one-sided and polarized writing which can occur in various forms. In this work, we focus on an important form of media bias, i.e. bias by word choice. Detecting biased word choices is a challenging task due to its linguistic complexity and the lack of representative gold-standard corpora. We present DA-RoBERTa, a new state-of-the-art transformer-based model adapted to the media bias domain which identifies sentence-level bias with an F1 score of 0.814. In addition, we also train, DA-BERT and DA-BART, two more transformer models adapted to the bias domain. Our proposed domain-adapted models outperform prior bias detection approaches on the same data. CCS CONCEPTS• Computing methodologies → Natural language processing; • Information systems → Clustering and classification.

show abstract

“…Specifically, MTL with transformer-based models has emerged as a popular approach to improve the performances of the closely related task in NLP [15], [16]. In this approach, a shared transformer learns several related tasks simultaneously, like sentence classification and word prediction, and the tasksspecific module yields the outcome for each task.…”

Section: A Vision Transformer (Vit)mentioning

confidence: 99%

Multi-Task Distributed Learning using Vision Transformer with Random Patch Permutation

Park¹,

Ye²

2022

Preprint

View full text Add to dashboard Cite

The widespread application of artificial intelligence in health research is currently hampered by limitations in data availability. Distributed learning methods such as federated learning (FL) and shared learning (SL) are introduced to solve this problem as well as data management and ownership issues with their different strengths and weaknesses. The recent proposal of federated split task-agnostic (FESTA) learning tries to reconcile the distinct merits of FL and SL by enabling the multi-task collaboration between participants through Vision Transformer (ViT) architecture, but they suffer from higher communication overhead. To address this, here we present a multi-task distributed learning using ViT with random patch permutation. Instead of using a CNN based head as in FESTA, p-FESTA adopts a randomly permuting simple patch embedder, improving the multi-task learning performance without sacrificing privacy. Experimental results confirm that the proposed method significantly enhances the benefit of multi-task collaboration, communication efficiency, and privacy preservation, shedding light on practical multi-task distributed learning in the field of medical imaging.

show abstract

Multi-Task Learning in Natural Language Processing: An Overview

Cited by 10 publications

References 132 publications

Collectively encoding protein properties enriches protein language models

Collectively encoding protein properties enriches protein language models

A domain-adaptive pre-training approach for language bias detection in news

Multi-Task Distributed Learning using Vision Transformer with Random Patch Permutation

Contact Info

Product

Resources

About