2021
DOI: 10.48550/arxiv.2109.09138
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-Task Learning in Natural Language Processing: An Overview

Abstract: Deep learning approaches have achieved great success in the field of Natural Language Processing (NLP). However, deep neural models often suffer from overfitting and data scarcity problems that are pervasive in NLP tasks. In recent years, Multi-Task Learning (MTL), which can leverage useful information of related tasks to achieve simultaneous performance improvement on multiple related tasks, has been used to handle these problems. In this paper, we give an overview of the use of MTL in NLP tasks. We first rev… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
13
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(13 citation statements)
references
References 132 publications
0
13
0
Order By: Relevance
“…Generally, an MTL model can be trained by linearly combining loss functions from different tasks into a single total loss function [15]. In this way, the model can learn a shared representation for all tasks by stochastic gradient descent (SGD) with back-propagation [15,43].…”
Section: Details Of Mtl Architecturementioning
confidence: 99%
See 2 more Smart Citations
“…Generally, an MTL model can be trained by linearly combining loss functions from different tasks into a single total loss function [15]. In this way, the model can learn a shared representation for all tasks by stochastic gradient descent (SGD) with back-propagation [15,43].…”
Section: Details Of Mtl Architecturementioning
confidence: 99%
“…Generally, an MTL model can be trained by linearly combining loss functions from different tasks into a single total loss function [15]. In this way, the model can learn a shared representation for all tasks by stochastic gradient descent (SGD) with back-propagation [15,43]. Ordinarily, assuming that there are M tasks in all, the global loss function can be defined as where L i represents task-specific loss function, and w i denotes weights assigned for each L i .…”
Section: Details Of Mtl Architecturementioning
confidence: 99%
See 1 more Smart Citation
“…Spinde et al [41] train DistilBERT [34] on combinations of biasrelated datasets using a Multi-task Learning (MTL) [6,54] approach. Their best-performing MTL model achieves 0.776 F1 score on a subset of BABE.…”
Section: Transformer-based Detection Approachesmentioning
confidence: 99%
“…Specifically, MTL with transformer-based models has emerged as a popular approach to improve the performances of the closely related task in NLP [15], [16]. In this approach, a shared transformer learns several related tasks simultaneously, like sentence classification and word prediction, and the tasksspecific module yields the outcome for each task.…”
Section: A Vision Transformer (Vit)mentioning
confidence: 99%