ProphetNet-X: Large-Scale Pre-training Models for English, Chinese, Multi-lingual, Dialog, and Code Generation

Qi, Weizhen; Gong, Yeyun; Yu, Yongtao; Xu, Can; Yao, Bolun; Zhou, Bartuer; Cheng, Biao; Jiang, Daxin; Chen, Jiusheng; Zhang, Ruofei; Li, Houqiang; Duan, Nan

doi:10.48550/arxiv.2104.08006

Cited by 11 publications

(19 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…CPM (Zhang et al, 2020c) maintains a similar model architecture as GPT with 2.6 billion parameters. CPM-2 (Zhang et al, 2021) scales up to 11 billion parameters and employs knowledge inheritance from existing models to accelerate the pre-training process. PanGu-α (Zeng et al, 2021) is a huge model, with up to 200 billion parameters.…”

Section: Large-scale Pre-trained Language Modelsmentioning

confidence: 99%

“…Besides the English version, PLATO-2 has one Chinese dialogue model of 363 million parameters, exhibiting prominent improvements over the classical chatbot of XiaoIce . There are some other Chinese dialogue models on a similar modest scale, including CDial-GPT and ProphetNet-X (Qi et al, 2021). Recently, one Chinese dialogue model of EVA (Zhou et al, 2021) is developed under the architecture of Seq2Seq, with up to 2.8 billion parameters.…”

Section: Pre-trained Dialogue Modelsmentioning

confidence: 99%

“…There are some other Chinese dialogue models on a similar modest scale, including CDial-GPT and ProphetNet-X (Qi et al, 2021). Recently, one Chinese dialogue model of EVA (Zhou et al, 2021) is developed under the architecture of Seq2Seq, with up to 2.8 billion parameters. In this paper, we will introduce the 11 billion parameter model of PLATO-XL, trained on both Chinese and English social media conversations.…”

Section: Pre-trained Dialogue Modelsmentioning

confidence: 99%

“…There are 95.5M parameters in this model. • ProphetNet-X (Qi et al, 2021) is a family of pre-trained models on various languages and domains. ProphetNet-X includes one Chinese dialogue generation model trained on social media conversations collected from Douban group 3 .…”

Section: Compared Approachesmentioning

confidence: 99%

“…There are 379M parameters in this model. • EVA (Zhou et al, 2021) is 2.8B parameter Chinese dialogue generation model trained with the WDC-Dialogue dataset, which includes 1.4B conversation samples collected from social medias.…”

Section: Compared Approachesmentioning

confidence: 99%

See 4 more Smart Citations

PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation

Bao¹,

Huang²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

To explore the limit of dialogue generation pre-training, we present the models of PLATO-XL with up to 11 billion parameters, trained on both Chinese and English social media conversations. To train such large models, we adopt the architecture of unified transformer with high computation and parameter efficiency. In addition, we carry out multi-party aware pre-training to better distinguish the characteristic information in social media conversations. With such designs, PLATO-XL successfully achieves superior performances as compared to other approaches in both Chinese and English chitchat. We further explore the capacity of PLATO-XL on other conversational tasks, such as knowledge grounded dialogue and task-oriented conversation. The experimental results indicate that PLATO-XL obtains stateof-the-art results across multiple conversational tasks, verifying its potential as a foundation model of conversational AI.

show abstract

Section: Large-scale Pre-trained Language Modelsmentioning

confidence: 99%

Section: Pre-trained Dialogue Modelsmentioning

confidence: 99%

Section: Pre-trained Dialogue Modelsmentioning

confidence: 99%

Section: Compared Approachesmentioning

confidence: 99%

Section: Compared Approachesmentioning

confidence: 99%

See 3 more Smart Citations

PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation

Bao¹,

Huang²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

Toward Zero-Shot and Zero-Resource Multilingual Question Answering

Kuo

Chen

2022

IEEE Access

View full text Add to dashboard Cite

In recent years, multilingual question answering has been an emergent research topic and has attracted much attention. Although systems for English and other rich-resource languages that rely on various advanced deep learning-based techniques have been highly developed, most of them in low-resource languages are impractical due to data insufficiency. Accordingly, many studies have attempted to improve the performance of low-resource languages in a zero-shot or few-shot manner based on multilingual bidirectional encoder representations from transformers (mBERT) by transferring knowledge learned from rich-resource languages to low-resource languages. Most methods require either a large amount of unlabeled data or a small set of labeled data for low-resource languages. In Wikipedia, 169 languages have less than 10,000 articles, and 48 languages have less than 1,000 articles. This reason motivates us to conduct a zeroshot multilingual question answering task under a zero-resource scenario. Thus, this study proposes a framework to fine-tune the original mBERT using data from rich-resource languages, and the resulting model can be used for low-resource languages in a zero-shot and zero-resource manner. Compared to several baseline systems, which require millions of unlabeled data for low-resource languages, the performance of our proposed framework is not only highly comparative but is also better for languages used in training. Multilingual question answering, zero-shot, zero-resource, mBERT INDEX TERMS

show abstract