Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.504
|View full text |Cite
|
Sign up to set email alerts
|

The Cascade Transformer: an Application for Efficient Answer Sentence Selection

Abstract: Large transformer-based language models have been shown to be very effective in many classification tasks. However, their computational complexity prevents their use in applications requiring the classification of a large set of candidates. While previous works have investigated approaches to reduce model size, relatively little attention has been paid to techniques to improve batch throughput during inference. In this paper, we introduce the Cascade Transformer, a simple yet effective technique to adapt trans… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
44
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 40 publications
(44 citation statements)
references
References 44 publications
0
44
0
Order By: Relevance
“…One key limitation of BERT is its inability to handle long input sequences and hence difficulty in ranking texts beyond a certain length (e.g., "full-length" documents such as news articles). This limitation is addressed by a number of models (Nogueira and Cho, 2019;Akkalyoncu Yilmaz et al, 2019;Dai and Callan, 2019b;MacAvaney et al, 2019;, and a simple retrieve-then-rerank approach can be elaborated into a multi-stage architecture with reranker pipelines (Nogueira et al, 2019a;Matsubara et al, 2020;Soldaini and Moschitti, 2020) that balance effectiveness and efficiency. On top of multi-stage ranking architectures, researchers have proposed additional innovations, including query expansion , document expansion (Nogueira et al, 2019b;Nogueira and Lin, 2019) and term importance prediction Callan, 2019a, 2020).…”
Section: Multi-stage Ranking Architecturesmentioning
confidence: 99%
“…One key limitation of BERT is its inability to handle long input sequences and hence difficulty in ranking texts beyond a certain length (e.g., "full-length" documents such as news articles). This limitation is addressed by a number of models (Nogueira and Cho, 2019;Akkalyoncu Yilmaz et al, 2019;Dai and Callan, 2019b;MacAvaney et al, 2019;, and a simple retrieve-then-rerank approach can be elaborated into a multi-stage architecture with reranker pipelines (Nogueira et al, 2019a;Matsubara et al, 2020;Soldaini and Moschitti, 2020) that balance effectiveness and efficiency. On top of multi-stage ranking architectures, researchers have proposed additional innovations, including query expansion , document expansion (Nogueira et al, 2019b;Nogueira and Lin, 2019) and term importance prediction Callan, 2019a, 2020).…”
Section: Multi-stage Ranking Architecturesmentioning
confidence: 99%
“…Finally, our word-relatedness encoder can replace the standard attention to enhance the speed of the fast attention-based approaches, resulting in a fast and accurate network in the class of fast methods. Our research will gain more and more important also in the light of improving the efficiency of large architecture using our models for sequential re-ranking (Matsubara et al, 2020;Soldaini and Moschitti, 2020).…”
Section: Introductionmentioning
confidence: 99%
“…We can see that in both datasets, early exiting is able to accelerate inference by ∼2.5× while maintaining the original model effectiveness. It is worth noting that in Cascade Transformer (CT) (Soldaini and Moschitti, 2020), only a part of the development set is used for evaluation, and therefore the scores are not directly comparable. However, in terms of relative performance, our model appears to achieve a bit higher inference speedup with a comparable score degradation.…”
Section: Resultsmentioning
confidence: 99%
“…Our work differs from them by using an early exiting strategy that specializes for document ranking. Another related work that focuses on retrieval is Cascade Transformer (Soldaini and Moschitti, 2020), where a fixed proportion of samples are dropped after each layer. In contrast, our work drops samples based on their scores, and empirically we are able to achieve higher inference speedups.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation