Identifying beneficial task relations for multi-task learning in
            deep neural networks

Bingel, Joachim; Søgaard, Anders

doi:10.18653/v1/e17-2026

Cited by 179 publications

(179 citation statements)

References 11 publications

Supporting

Mentioning

168

Contrasting

Order By: Relevance

“…Furthermore, introducing LEX (cf. Section 4) as auxiliary task was generally helpful; on the other hand, POS did not seem to help, corroborating previous findings (Alonso and Plank, 2017;Bingel and Søgaard, 2017).…”

Section: Resultssupporting

confidence: 88%

Neural Sequence Learning Models for Word Sense Disambiguation

Raganato¹,

Bovi²,

Navigli³

2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

171

180

View full text Add to dashboard Cite

Word Sense Disambiguation models exist in many flavors. Even though supervised ones tend to perform best in terms of accuracy, they often lose ground to more flexible knowledge-based solutions, which do not require training by a word expert for every disambiguation target. To bridge this gap we adopt a different perspective and rely on sequence learning to frame the disambiguation problem: we propose and study in depth a series of end-to-end neural architectures directly tailored to the task, from bidirectional Long Short-Term Memory to encoder-decoder models. Our extensive evaluation over standard benchmarks and in multiple languages shows that sequence learning enables more versatile all-words models that consistently lead to state-of-the-art results, even against word experts with engineered features.

show abstract

Section: Resultssupporting

confidence: 88%

Neural Sequence Learning Models for Word Sense Disambiguation

Raganato¹,

Bovi²,

Navigli³

2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

171

180

View full text Add to dashboard Cite

show abstract

“…A common approach involves training the MTL model on different task specific corpus by randomly switching between the different tasks and updating both the task-specific and shared parameters based on its corpus. [15], [17], [19] employed this training strategy. A joint end-to-end model training strategy is mostly suitable for cases where the alternative tasks are treated as auxiliary objectives on the same dataset.…”

Section: Related Workmentioning

confidence: 99%

Gated Task Interaction Framework for Multi-task Sequence Tagging

Ampomah

Lin

McClean

et al. 2019

2019 International Joint Conference on Neural Networks (IJCNN)

View full text Add to dashboard Cite

Recent studies have shown that neural models can achieve high performance on several sequence labelling/tagging problems without the explicit use of linguistic features such as part-of-speech (POS) tags. These models are trained only using the character-level and the word embedding vectors as inputs. Others have shown that linguistic features can improve the performance of neural models on tasks such as chunking and named entity recognition (NER). However, the change in performance depends on the degree of semantic relatedness between the linguistic features and the target task; in some instances, linguistic features can have a negative impact on performance. This paper presents an approach to jointly learn these linguistic features along with the target sequence labelling tasks with a new multi-task learning (MTL) framework called Gated Tasks Interaction (GTI) network for solving multiple sequence tagging tasks. The GTI network exploits the relations between the multiple tasks via neural gate modules. These gate modules control the flow of information between the different tasks. Experiments on benchmark datasets for chunking and NER show that our framework outperforms other competitive baselines trained with and without external training resources.

show abstract

“…This can be beneficial in a number of scenarios. Previous work has shown benefits, e.g., in cases where one has tasks which are closely related to one another (Bjerva, 2017a,b), when one task can help another escape a local minimum (Bingel and Søgaard, 2017), and when one has access to some unsupervised signal which can be beneficial to the task at hand (Rei, 2017). A common approach to MTL is the application of hard parameter sharing, in which some set of parameters in a model is shared between several tasks.…”

Section: Related Work 21 Multitask Learningmentioning

confidence: 99%

Cross-lingual complex word identification with multitask learning

Bingel¹,

Bjerva²

2018

Proceedings of the Thirteenth Workshop on Innovative Use of NLP For Building Educational Applications

Self Cite

View full text Add to dashboard Cite

We approach the 2018 Shared Task on Complex Word Identification by leveraging a crosslingual multitask learning approach. Our method is highly language agnostic, as evidenced by the ability of our system to generalize across languages, including languages for which we have no training data. In the shared task, this is the case for French, for which our system achieves the best performance. We further provide a qualitative and quantitative analysis of which words pose problems for our system.

show abstract

Identifying beneficial task relations for multi-task learning in deep neural networks

Cited by 179 publications

References 11 publications

Neural Sequence Learning Models for Word Sense Disambiguation

Neural Sequence Learning Models for Word Sense Disambiguation

Gated Task Interaction Framework for Multi-task Sequence Tagging

Cross-lingual complex word identification with multitask learning

Contact Info

Product

Resources

About