Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Rothe, Sascha; Narayan, Shashi; Severyn, Aliaksei

doi:10.1162/tacl_a_00313

Cited by 326 publications

(293 citation statements)

References 29 publications

Supporting

Mentioning

256

Contrasting

Unclassified

Order By: Relevance

“…Devlin et al [2019] proposed BERT based on masked language modeling and next sentence prediction, and achieved state-of-theart results on multiple NLP tasks. There are also some works on pre-training the encoder-decoder model for language generation [Rothe et al, 2019;Edunov et al, 2019;Liu and Lapata, 2019]. The main difference between our generation model and others are that our model uses pre-trained BERT model in the encoder side and uses a non-pre-trained Transformer on the decoder side, and we fine-tune the encoder and train the decoder using two separate optimizers.…”

Section: Quanzhi LI and Qiong Zhangmentioning

confidence: 99%

See 1 more Smart Citation

A Unified Model for Financial Event Classification, Detection and Summarization

Zhang

2020

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

There is massive amount of news on financial events every day. In this paper, we present a unified model for detecting, classifying and summarizing financial events. This model exploits a multi-task learning approach, in which a pre-trained BERT model is used to encode the news articles, and the encoded information are shared by event type classification, detection and summarization tasks. For event summarization, we use a Transformer structure as the decoder. In addition to the input document encoded by BERT, the decoder also utilizes the predicted event type and cluster information, so that it can focus on the specific aspects of the event when generating summary. Our experiments show that our approach outperforms other methods.

show abstract

Section: Quanzhi LI and Qiong Zhangmentioning

confidence: 99%

“…The loss of the joint learning model is the sum of loss of the three tasks. There are previous studies on pre-training encoder-decoder model for language generation [Rothe et al, 2019;Edunov et al, 2019;Liu and Lapata, 2019], and some of them also use different optimizers for different components.…”

Section: Model Training and Inferencementioning

confidence: 99%

A Unified Model for Financial Event Classification, Detection and Summarization

Zhang

2020

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

show abstract

“…The hidden states generated by the encoder for the entire input sequence are passed to the decoder, thus allowing the decoder to attend over the entire input sequence during each decoding step. This model serves as our primary baseline, as it is identical to the BERT2RND model in Rothe et al (2019). We use the same hyperparameters as Rothe et al (2019), which were selected after extensive tuning.…”

Section: Modelsmentioning

confidence: 99%

“…This model serves as our primary baseline, as it is identical to the BERT2RND model in Rothe et al (2019). We use the same hyperparameters as Rothe et al (2019), which were selected after extensive tuning.…”

Section: Modelsmentioning

confidence: 99%

“…A common state-of-the-art model for many existing text generation tasks uses an encoderdecoder framework where the encoder is initialized with BERT and the decoder is also a transformer Devlin et al, 2019;Rothe et al, 2019). The entire output of the encoder is passed to the decoder, which allows the decoder to attend over the entire input sequence during each generation step.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Proceedings of the Fourth Workshop on Neural Generation and Translation

2020

View full text Add to dashboard Cite

We describe the finding of the Fourth Workshop on Neural Generation and Translation, held in concert with the annual conference of the Association for Computational Linguistics (ACL 2020). First, we summarize the research trends of papers presented in the proceedings. Second, we describe the results of the three shared tasks 1) efficient neural machine translation (NMT) where participants were tasked with creating NMT systems that are both accurate and efficient, and 2) document-level generation and translation (DGT) where participants were tasked with developing systems that generate summaries from structured data, potentially with assistance from text in another language and 3) STAPLE task: creation of as many possible translations of a given input text. This last shared task was organised by Duolingo.1 BLEU+case.mixed+lang.en-de+numrefs.1+s mooth.exp+test.wmt * +tok.13a+version.1.4.8 for various WMT test sets 2 Participants are likely to have used these test sets in development. The WMT 2020 test set was not yet available and others were out of the domain the systems were trained for.

show abstract

Crystal Composition Transformer: Self‐Learning Neural Language Model for Generative and Tinkering Design of Materials

Wei,

Li,

Song

et al. 2024

Advanced Science

View full text Add to dashboard Cite

Self‐supervised neural language models have recently achieved unprecedented success from natural language processing to learning the languages of biological sequences and organic molecules. These models have demonstrated superior performance in the generation, structure classification, and functional predictions for proteins and molecules with learned representations. However, most of the masking‐based pre‐trained language models are not designed for generative design, and their black‐box nature makes it difficult to interpret their design logic. Here a Blank‐filling Language Model for Materials (BLMM) Crystal Transformer is proposed, a neural network‐based probabilistic generative model for generative and tinkering design of inorganic materials. The model is built on the blank‐filling language model for text generation and has demonstrated unique advantages in learning the “materials grammars” together with high‐quality generation, interpretability, and data efficiency. It can generate chemically valid materials compositions with as high as 89.7% charge neutrality and 84.8% balanced electronegativity, which are more than four and eight times higher compared to a pseudo‐random sampling baseline. The probabilistic generation process of BLMM allows it to recommend materials tinkering operations based on learned materials chemistry, which makes it useful for materials doping. The model is applied to discover a set of new materials as validated using the Density Functional Theory (DFT) calculations. This work thus brings the unsupervised transformer language models based generative artificial intelligence to inorganic materials. A user‐friendly web app for tinkering materials design has been developed and can be accessed freely at www.materialsatlas.org/blmtinker.

show abstract

Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Cited by 326 publications

References 29 publications

A Unified Model for Financial Event Classification, Detection and Summarization

A Unified Model for Financial Event Classification, Detection and Summarization

Proceedings of the Fourth Workshop on Neural Generation and Translation

Crystal Composition Transformer: Self‐Learning Neural Language Model for Generative and Tinkering Design of Materials

Contact Info

Product

Resources

About