Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-1172
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Task Learning of Pairwise Sequence Classification Tasks over Disparate Label Spaces

Abstract: We combine multi-task learning and semisupervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label embeddings, enabling us to jointly leverage unlabelled data and auxiliary, annotated datasets. We evaluate our approach on a variety of sequence classification tasks with disparate label spaces. We outperform strong single and multi-task baselines and achieve a new stateof-the-art for topic-based sentiment analysis.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
67
1

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 70 publications
(69 citation statements)
references
References 34 publications
1
67
1
Order By: Relevance
“…The performance boost is particularly significant for the out-ofdomain setting, where sluice networks add more than 1 point in accuracy compared to hard parameter sharing and almost .5 compared to the strongest baseline on average, demonstrating that sluice networks are particularly useful to help a model generalize better. In contrast to previous studies on MTL (Martínez Alonso and Plank 2017;Bingel and Søgaard 2017;Augenstein, Ruder, and Søgaard 2018), our model also consistently outperforms single-task learning. Overall, this demonstrates that our meta-architecture for learning which parts of multi-task models to share, with a small set of additional parameters to learn, can achieve significant and consistent improvements over strong baseline methods.…”
Section: Training and Evaluationcontrasting
confidence: 93%
“…The performance boost is particularly significant for the out-ofdomain setting, where sluice networks add more than 1 point in accuracy compared to hard parameter sharing and almost .5 compared to the strongest baseline on average, demonstrating that sluice networks are particularly useful to help a model generalize better. In contrast to previous studies on MTL (Martínez Alonso and Plank 2017;Bingel and Søgaard 2017;Augenstein, Ruder, and Søgaard 2018), our model also consistently outperforms single-task learning. Overall, this demonstrates that our meta-architecture for learning which parts of multi-task models to share, with a small set of additional parameters to learn, can achieve significant and consistent improvements over strong baseline methods.…”
Section: Training and Evaluationcontrasting
confidence: 93%
“…Unlike (Hashimoto et al 2017) and other previous work (Katiyar and Cardie 2017;Bekoulis et al 2018;Augenstein, Ruder, and Søgaard 2018), we do not learn label embeddings, meaning that the (supervised) output/prediction of a layer is not directly fed to the following layer through an embedding learned during training. Nonetheless, sharing embeddings and stacking hierarchical encoders allows us to share the supervision from each task along the full structure of our model and achieve state-of-the-art performance.…”
Section: Related Workmentioning
confidence: 99%
“…In addition, many works have attempted to learn the relationship through a matrix space or utilize additional regularization to increase the model learning capability. For example, Augenstein et al (2018) propose to leverage unlabeled or auxiliary data for better text classification. They first designed a label embedding layer to learn a relationship space between disparate labels.…”
Section: Multi-task Learningmentioning
confidence: 99%