Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 2021
DOI: 10.18653/v1/2021.findings-acl.26
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning

Abstract: Transfer learning has become the dominant paradigm for many natural language processing tasks. In addition to models being pretrained on large datasets, they can be further trained on intermediate (supervised) tasks that are similar to the target task. For small Natural Language Inference (NLI) datasets, language modelling is typically followed by pretraining on a large (labelled) NLI dataset before fine-tuning with each NLI subtask. In this work, we explore Gradient Boosted Decision Trees (GBDTs) as an altern… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 51 publications
0
3
0
Order By: Relevance
“…Previous studies have demonstrated benefits of utilizing learned representations from Transformer Networks as inputs for gradient boosting models, leading to improved outcomes compared to directly using the predictions of a Transformer Network 19,36 . We thus follow a similar approach here.…”
Section: Resultsmentioning
confidence: 99%
“…Previous studies have demonstrated benefits of utilizing learned representations from Transformer Networks as inputs for gradient boosting models, leading to improved outcomes compared to directly using the predictions of a Transformer Network 19,36 . We thus follow a similar approach here.…”
Section: Resultsmentioning
confidence: 99%
“…Previous studies have demonstrated benefits of utilizing learned representations from Transformer Networks as inputs for gradient boosting models, leading to improved outcomes compared to directly using the predictions of a Transformer Network [18,37]. We thus follow a similar approach here.…”
Section: Prosmith Feeds the Learned Representations To Gradient Boost...mentioning
confidence: 99%
“…The final ProSmith model does not use these predictions directly, but instead uses the learned joint protein-small molecule representations to train gradient boosting models. This strategy was motivated by previous studies that showed superior results when adding a gradient boosting step [18,37]. To investigate whether this additional step indeed contributed to the superior performance of ProSmith, we re-examined the enzyme-substrate prediction task, comparing the model performance of directly using the end-to-end trained multimodal Transformer Network with that of a gradient boosting model that takes the learned joint protein-small molecule representation from this Network as input.…”
Section: Prosmith's Model Architecture Has An Important Impact On Mod...mentioning
confidence: 99%