A Robustly Optimized BERT Pre-training Approach with Post-training

Liu, Zhuang; Lin, Wayne; Shi, Ya; Zhao, Jun

doi:10.1007/978-3-030-84186-7_31

Cited by 1,669 publications

(2,534 citation statements)

References 17 publications

Supporting

Mentioning

2,088

Contrasting

Unclassified

Order By: Relevance

“…Finally, although RoBERTa (Liu et al, 2019) has exhibited improvements over BERT on many different tasks, we found that, in this case, using pretrained RoBERTa instead of BERT does not yield much improvement. The predictions of the two models are highly correlated, with 0.95 correlation over all datasets' predictions.…”

Section: Resultsmentioning

confidence: 65%

He Thinks He Knows Better than the Doctors: BERT for Event Factuality Fails on Pragmatics

Jiang¹,

Marneffe²

2021

Preprint

View full text Add to dashboard Cite

We investigate how well BERT performs on predicting factuality in several existing English datasets, encompassing various linguistic constructions. Although BERT obtains a strong performance on most datasets, it does so by exploiting common surface patterns that correlate with certain factuality labels, and it fails on instances where pragmatic reasoning is necessary. Contrary to what the high performance suggests, we are still far from having a robust system for factuality prediction.

show abstract

Section: Resultsmentioning

confidence: 65%

He Thinks He Knows Better than the Doctors: BERT for Event Factuality Fails on Pragmatics

Jiang¹,

Marneffe²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The embedding of [CLS] is mainly learned from NSP. However, a recent study shows that NSP does not contribute much to the sentence representation learning [33]. SBERT-WK can make use of the existing semantics in BERT as much as possible, but it cannot increase the semantics in BERT.…”

Section: Related Workmentioning

confidence: 97%

Efficient Estimate of Low-Frequency Words’ Embeddings Based on the Dictionary: A Case Study on Chinese

et al. 2021

View full text Add to dashboard Cite

Obtaining high-quality embeddings of out-of-vocabularies (OOVs) and low-frequency words is a challenge in natural language processing (NLP). To efficiently estimate the embeddings of OOVs and low-frequency words, we propose a new method that uses the dictionary to estimate the embeddings of OOVs and low-frequency words. More specifically, the explanatory note of an entry in dictionaries accurately describes the semantics of the corresponding word. Naturally, we adopt the sentence representation model to extract the semantics of the explanatory note and regard the semantics as the embedding of the corresponding word. We design a new sentence representation model to encode sentences to extract the semantics from the explanatory notes of entries more efficiently. Based on the assumption that the higher quality of word embeddings will lead to better performance, we design an extrinsic experiment to evaluate the quality of low-frequency words’ embeddings. The experimental results show that the embeddings of low-frequency words estimated by our proposed method have higher quality. In addition, both intrinsic and extrinsic experiments show that our proposed sentence representation model can represent the semantics of sentences well.

show abstract

“…Representative autoregressive language models are word2vec (Mikolov et al , 2013), Glove (Pennington et al , 2014), ELMO (Peters et al , 2018), GPT (Radford et al , 2018), GPT-2 (Radford et al , 2019) and XLNet (Yang et al , 2019), and they are more suitable for text generation task. Representative autoencoding language models are Bert (Devlin et al , 2018), Bert-wwm (Cui et al , 2019), RoBERTa (Liu et al , 2019), ALBERT (Lan et al , 2019), ERNIE (Sun et al , 2019a), ERNIE-2 (Sun et al , 2019b) and ELECTRA (Clark et al , 2020), and they are more suitable for entity and relation extraction.…”

Section: Related Workmentioning

confidence: 99%

Using pretraining and text mining methods to automatically extract the chemical scientific data

Pang¹,

Li²,

Lyu³

et al. 2021

DTA

View full text Add to dashboard Cite

Purpose In computational chemistry, the chemical bond energy (pKa) is essential, but most pKa-related data are submerged in scientific papers, with only a few data that have been extracted by domain experts manually. The loss of scientific data does not contribute to in-depth and innovative scientific data analysis. To address this problem, this study aims to utilize natural language processing methods to extract pKa-related scientific data in chemical papers. Design/methodology/approach Based on the previous Bert-CRF model combined with dictionaries and rules to resolve the problem of a large number of unknown words of professional vocabulary, in this paper, the authors proposed an end-to-end Bert-CRF model with inputting constructed domain wordpiece tokens using text mining methods. The authors use standard high-frequency string extraction techniques to construct domain wordpiece tokens for specific domains. And in the subsequent deep learning work, domain features are added to the input. Findings The experiments show that the end-to-end Bert-CRF model could have a relatively good result and can be easily transferred to other domains because it reduces the requirements for experts by using automatic high-frequency wordpiece tokens extraction techniques to construct the domain wordpiece tokenization rules and then input domain features to the Bert model. Originality/value By decomposing lots of unknown words with domain feature-based wordpiece tokens, the authors manage to resolve the problem of a large amount of professional vocabulary and achieve a relatively ideal extraction result compared to the baseline model. The end-to-end model explores low-cost migration for entity and relation extraction in professional fields, reducing the requirements for experts.

show abstract

A Robustly Optimized BERT Pre-training Approach with Post-training

Cited by 1,669 publications

References 17 publications

He Thinks He Knows Better than the Doctors: BERT for Event Factuality Fails on Pragmatics

He Thinks He Knows Better than the Doctors: BERT for Event Factuality Fails on Pragmatics

Efficient Estimate of Low-Frequency Words’ Embeddings Based on the Dictionary: A Case Study on Chinese

Using pretraining and text mining methods to automatically extract the chemical scientific data

Contact Info

Product

Resources

About