Untitled

Guo, Han; Pasunuru, Ramakanth; Bansal, Mohit

doi:10.18653/v1/n19-1355

Cited by 12 publications

(3 citation statements)

References 27 publications

(26 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The per-target-word loss is then interpolated with instance prediction (one or two sentences) loss using a coefficient λ. Such a multi-task learning objective has been shown to improve performance on a number of tasks (Guo et al, 2019).…”

Section: Fine-grained Content Selectionmentioning

confidence: 99%

A Cascade Approach to Neural Abstractive Summarization with Content Selection and Fusion

Lebanoff

Dernoncourt

Kim

et al. 2020

Preprint

View full text Add to dashboard Cite

We present an empirical study in favor of a cascade architecture to neural text summarization. Summarization practices vary widely but few other than news summarization can provide a sufficient amount of training data enough to meet the requirement of end-to-end neural abstractive systems which perform content selection and surface realization jointly to generate abstracts. Such systems also pose a challenge to summarization evaluation, as they force content selection to be evaluated along with text generation, yet evaluation of the latter remains an unsolved problem. In this paper, we present empirical results showing that the performance of a cascaded pipeline that separately identifies important content pieces and stitches them together into a coherent text is comparable to or outranks that of end-to-end systems, whereas a pipeline architecture allows for flexible content selection. We finally discuss how we can take advantage of a cascaded pipeline in neural text summarization and shed light on important directions for future research.

show abstract

Section: Fine-grained Content Selectionmentioning

confidence: 99%

A Cascade Approach to Neural Abstractive Summarization with Content Selection and Fusion

Lebanoff

Dernoncourt

Kim

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…with Bayesian optimization (Ruder and Plank, 2017), but these methods are time-and resource-consuming due to their reliance on multitask experiments involving all the candidate tasks. AUTOSEM (Guo et al, 2019) combines the two settings into one method, selecting candidate tasks with Thompson sampling and deciding the ratio with which to draw training instances from the selected tasks via a Gaussian Process. Despite the higher quality of the auxiliary task sets it generates, AUTOSEM is still costly, similar to Ruder and Plank (2017).…”

Section: Introductionmentioning

confidence: 99%

“…The contributions of this paper are three-fold: Following Guo et al (2019), we use the 8 classification tasks in GLUE benchmarks (Wang et al, 2019), namely CoLA, MRPC, MNLI, QNLI, QQP, RTE, SST-2, and WNLI, in our main experiments. We apply the standard split of these datasets as Wang et al (2019) describe.…”

Section: Introductionmentioning

confidence: 99%

GradTS: A Gradient-Based Automatic Auxiliary Task Selection Method Based on Transformer Networks

Ma¹,

Lou²,

Zhang³

et al. 2021

Preprint

View full text Add to dashboard Cite

A key problem in multi-task learning (MTL) research is how to select high-quality auxiliary tasks automatically. This paper presents GradTS, an automatic auxiliary task selection method based on gradient calculation in Transformer-based models. Compared to AU-TOSEM, a strong baseline method, GradTS improves the performance of MT-DNN with a bert-base-cased backend model, from 0.33% to 17.93% on 8 natural language understanding (NLU) tasks in the GLUE benchmarks. GradTS is also time-saving since (1) its gradient calculations are based on single-task experiments and (2) the gradients are re-used without additional experiments when the candidate task set changes. On the 8 GLUE classification tasks, for example, GradTS costs on average 21.32% less time than AUTOSEM with comparable GPU consumption. Further, we show the robustness of GradTS across various task settings and model selections, e.g. mixed objectives among candidate tasks. The efficiency and efficacy of GradTS in these case studies illustrate its general applicability in MTL research without requiring manual task filtering or costly parameter tuning.

show abstract