Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 1 2017
DOI: 10.18653/v1/e17-1005
|View full text |Cite
|
Sign up to set email alerts
|

When is multitask learning effective? Semantic sequence prediction under varying data conditions

Abstract: Multitask learning has been applied successfully to a range of tasks, mostly morphosyntactic. However, little is known on when MTL works and whether there are data characteristics that help to determine its success. In this paper we evaluate a range of semantic sequence labeling tasks in a MTL setup. We examine different auxiliary tasks, amongst which a novel setup, and correlate their impact to datadependent conditions. Our results show that MTL is not always effective, significant improvements are obtained o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
9
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 77 publications
(11 citation statements)
references
References 27 publications
2
9
0
Order By: Relevance
“…Both the SFU and CD data sets not only have low label kurtosis but also have relatively low label entropy. This partially aligns with previous research (Martínez Alonso and Plank 2017; Bingel and Søgaard 2017), which suggests that for an auxiliary task to improve the main task, the entropy of the labels should be high (implying the task should not be trivial to learn), and the kurtosis should be low (the labels should not have an overly long-tailed distribution). The fact that the multi-word task is more helpful, however, seems to indicate that the appropriateness of the auxiliary task for the main task is more important than the specific data set properties.…”
Section: Model Analysissupporting
confidence: 90%
See 3 more Smart Citations
“…Both the SFU and CD data sets not only have low label kurtosis but also have relatively low label entropy. This partially aligns with previous research (Martínez Alonso and Plank 2017; Bingel and Søgaard 2017), which suggests that for an auxiliary task to improve the main task, the entropy of the labels should be high (implying the task should not be trivial to learn), and the kurtosis should be low (the labels should not have an overly long-tailed distribution). The fact that the multi-word task is more helpful, however, seems to indicate that the appropriateness of the auxiliary task for the main task is more important than the specific data set properties.…”
Section: Model Analysissupporting
confidence: 90%
“…Here, entropy indicates the amount of uncertainty in the label distribution, while kurtosis indicates the skewness. These measures have been shown to correlate well with the usefulness of auxiliary tasks in previous work (Martínez Alonso and Plank 2017; Bingel and Søgaard 2017).…”
Section: Model Analysissupporting
confidence: 62%
See 2 more Smart Citations
“…When jointly training argument mining and discourse parsing tasks, in contrast, the results obtained for the argument mining models are worse than those obtained when the models are trained in single task settings. This effect, known as negative transfer, is not uncommon in multi-task settings [29]. In fact, multi-task learning architectures are known to be sensitive to a large number of parameters, including the distribution of the labels, the sizes of the respective datasets and the sampling strategies implemented in order to select the mini-batches when switching between tasks [54].…”
Section: Methodsmentioning
confidence: 99%