ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training

Qi, Weizhen; Yu, Yongtao; Gong, Yeyun; Liu, Dayiheng; Duan, Nan; Chen, Jiusheng; Zhang, Ruofei; Zhou, Ming

doi:10.48550/arxiv.2001.04063

Cited by 37 publications

(29 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We perform further pre-training on 160GB unlabeled English corpus, including news, books, stories and web text. It is similar to the corpus of well-known AR pre-training works such as Prophet-Net (Qi et al, 2020) and BART (Lewis et al, 2019). The learning rate is set to 4e-4, 366k steps, batch size 2048, distillation weight α 0.5 on 16 32GB memory NVIDIA Tesla V100 GPUs.…”

Section: Pre-training Resultsmentioning

confidence: 99%

“…Consider the sequence to sequence generation scenario, we denote the input and output sequence as (x, y). For a typical neural sequence generation model, i.e., (Lewis et al, 2019;Song et al, 2019;Qi et al, 2020), it encodes the input sequence x into dense representation h in Eqn. 1, and decodes a sequence of tokens as output y :…”

Section: Non-autoregressive Generationmentioning

confidence: 99%

“…Each passage can be combined with various answers to raise different questions. We follow previous work (Qi et al, 2020(Qi et al, , 2021) to feed answer [SEP] passage into transformer encoder as the input, with an average length 149.4. The average output length is 11.5.…”

Section: Public Datasetsmentioning

confidence: 99%

“…AR generation has been widely developed in recent years, and pre-training techniques achieve significantly performance improvement in AR generation tasks (Brown et al, 2020a;Lewis et al, 2020;Raffel et al, 2020;Qi et al, 2020). GPT3 (Brown et al, 2020a) pre-train a large model and generate the next token from left-to-right.…”

Section: Related Workmentioning

confidence: 99%

See 3 more Smart Citations

A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation

Qi¹,

Gong²,

Shen³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Non-Autoregressive generation is a sequence generation paradigm, which removes the dependency between target tokens. It could efficiently reduce the text generation latency with parallel decoding in place of tokenby-token sequential decoding.However, due to the known multi-modality problem, Non-Autoregressive (NAR) models significantly under-perform Auto-regressive (AR) models on various language generation tasks. Among the NAR models, BANG is the first large-scale pre-training model on English un-labeled raw text corpus. It considers different generation paradigms as its pre-training tasks including Auto-regressive (AR), Non-Autoregressive (NAR), and semi-Non-Autoregressive (semi-NAR) information flow with multi-stream strategy. It achieves state-of-the-art performance without any distillation techniques. However, AR distillation has been shown to be a very effective solution for improving NAR performance. In this paper, we propose a novel self-paced mixed distillation method to further improve the generation quality of BANG. Firstly, we propose the mixed distillation strategy based on the AR stream knowledge. Secondly, we encourage the model to focus on the samples with the same modality by self-paced learning. The proposed self-paced mixed distillation algorithm improves the generation quality and has no influence on the inference latency. We carry out extensive experiments on summarization and question generation tasks to validate the effectiveness. To further illustrate the commercial value of our approach, we conduct experiments on three generation tasks in real-world advertisements applications. Experimental results on commercial data show the effectiveness of the proposed model. Compared with BANG, it achieves significant BLEU score improvement. On the other hand, compared with * Work is done during internship at Microsoft Research Asia.† Corresponding Author.auto-regressive generation method, it achieves more than 7x speedup. We will make our code publicly available.

show abstract

Section: Pre-training Resultsmentioning

confidence: 99%

Section: Non-autoregressive Generationmentioning

confidence: 99%

Section: Public Datasetsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation

Qi¹,

Gong²,

Shen³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…This work draws on our rich experience in summarizing meeting conversations Liu, 2009, 2013;Koay et al, 2020) and building neural abstractive systems (Lebanoff et al, 2019(Lebanoff et al, , 2020Song et al, 2020). We have chosen an abstractive system over its extractive counterpart for this task, as neural abstractive systems have seen significant progress (Raffel et al, 2019;Lewis et al, 2020;Qi et al, 2020). Not only can an abstract accurately convey the content of the podcast, but it is in a succinct form that is easy to read on a smartphone.…”

Section: Our Summarymentioning

confidence: 99%

Automatic Summarization of Open-Domain Podcast Episodes

Song¹,

Li²,

Wang³

et al. 2020

Preprint

View full text Add to dashboard Cite

We present implementation details of our abstractive summarizers that achieve competitive results on the Podcast Summarization task of TREC 2020. A concise textual summary that captures important information is crucial for users to decide whether to listen to the podcast. Prior work focuses primarily on learning contextualized representations. Instead, we investigate several less-studied aspects of neural abstractive summarization, including (i) the importance of selecting important segments from transcripts to serve as input to the summarizer; (ii) striking a balance between the amount and quality of training instances; (iii) the appropriate summary length and start/end points. We highlight the design considerations behind our system and offer key insights into the strengths and weaknesses of neural abstractive systems. Our results suggest that identifying important segments from transcripts to use as input to an abstractive summarizer is advantageous for summarizing long documents. Our best system achieves a quality rating of 1.559 judged by NIST evaluators-an absolute increase of 0.268 (+21%) over the creator descriptions.

show abstract

Microfluidic skin chip with vasculature for recapitulating the immune response of the skin tissue

Kwak

Jin

Kim³

et al. 2020

Biotech & Bioengineering

View full text Add to dashboard Cite

There is a considerable need for cell‐based in vitro skin models for studying dermatological diseases and testing cosmetic products, but current in vitro skin models lack physiological relevance compared to human skin tissue. For example, many dermatological disorders involve complex immune responses, but current skin models are not capable of recapitulating the phenomena. Previously, we reported development of a microfluidic skin chip with a vessel structure and vascular endothelial cells. In this study, we cocultured dermal fibroblasts and keratinocytes with vascular endothelial cells, human umbilical vascular endothelial cells. We verified the formation of a vascular endothelium in the presence of the dermis and epidermis layers by examining the expression of tissue‐specific markers. As the vascular endothelium plays a critical role in the migration of leukocytes to inflammation sites, we incorporated leukocytes in the circulating media and attempted to mimic the migration of neutrophils in response to external stimuli. Increased secretion of cytokines and migration of neutrophils was observed when the skin chip was exposed to ultraviolet irradiation, showing that the microfluidic skin chip may be useful for studying the immune response of the human tissue.

show abstract

ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training

Cited by 37 publications

References 21 publications

A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation

A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation

Automatic Summarization of Open-Domain Podcast Episodes

Microfluidic skin chip with vasculature for recapitulating the immune response of the skin tissue

Contact Info

Product

Resources

About