Glyce: Glyph-vectors for Chinese Character Representations

Meng, Yuxian; Wu, Wei; Wang, Fei; Li, Xiaoya; Nie, Ping; Yin, Fan; Li, Muyu; Han, Qinghong; Sun, Xiaofei; Li, Jiwei

doi:10.48550/arxiv.1901.10125

Cited by 12 publications

(10 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Experiment results show that ERNIE 3.0 also outperforms the current SoTA system by a great margin. , SKEP [80], RoBERTa-wwm-ext-large [81] (marked as RoBERTa*), ALBERT [82], MacBERT [83], Zen 2.0 [84], Glyce [85] and crossed BERT siamese BiGRU [86] (marked as BERT_BiGRU*).…”

Section: Fine-tuning On Natural Language Understanding Tasksmentioning

confidence: 99%

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Sun,

Wang,

Feng

et al. 2021

Preprint

View full text Add to dashboard Cite

Pre-trained models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. Recent works such as T5 [1] and have shown that scaling up pre-trained language models can improve their generalization abilities. Particularly, the GPT-3 model with 175 billion parameters shows its strong task-agnostic zero-shot/few-shot learning capabilities. Despite their success, these large-scale models are trained on plain texts without introducing knowledge such as linguistic knowledge and world knowledge. In addition, most large-scale models are trained in an auto-regressive way. As a result, this kind of traditional fine-tuning approach demonstrates relatively weak performance when solving downstream language understanding tasks. In order to solve the above problems, we propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models. It fuses auto-regressive network and auto-encoding network, so that the trained model can be easily tailored for both natural language understanding and generation tasks with zero-shot learning, few-shot learning or fine-tuning. We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph. Empirical results show that the model outperforms the state-of-the-art models on 54 Chinese NLP tasks, and its English version achieves the first place on the SuperGLUE [3] benchmark (July 3, 2021), surpassing the human performance by +0.8% (90.6% vs. 89.8%).

show abstract

Section: Fine-tuning On Natural Language Understanding Tasksmentioning

confidence: 99%

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Sun,

Wang,

Feng

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In addition, some researchers extract the glyph features of Chinese characters from their graphic aspects. Meng Chinese characters as images and used CNNs to obtain their representations [27]. FGN was proposed by Xuan et al On the one hand, a new CNN structure was proposed, called CGS-CNN.…”

Section: Related Workmentioning

confidence: 99%

AIP: A Named Entity Recognition Method Combining Glyphs and Sounds

Liu

2022

ACM Trans. Asian Low-Resour. Lang. Inf. Process.

View full text Add to dashboard Cite

In recent years, a large number of Chinese electronic texts have been produced in the process of information construction in various fields. Identifying specific entities in these electronic texts has become a major research focus. Most existing research methods use radicals to extract the glyph features of Chinese characters but have seen its limitation. This paper extracts the features of Chinese characters from three aspects: glyph features, phonetic features, and character features, and improves conventional feature extraction methods for each kind of feature. A new named entity recognition method (AIP) is proposed by transforming Chinese characters into corresponding images for glyph feature extraction, dividing pinyin into initials, vowels, and tones for phonetic feature extraction, and fine-tuning the A Lite Bert model for character feature extraction to improve the performance of the model. This paper compares the performance of the AIP model and mainstream neural network models on Chinese named entity recognition tasks on commonly used data sets and the data sets in specific domain. The results showed that AIP achieved better results than the related arts. The F1 values on the two data sets are 94.4% and 80.5%, respectively, which validates the model's versatility.

show abstract

“…Compared with recent research, we pay more attention to the radical meaning and the frame structure of Chinese characters. The Glyce-Bert encoder [19] which utilized a Tianzige-CNN structure is adopted as the visual information encoder. To some extent, it fits the origin of Chinese characters better than other methods, such as stroke sequence [8,17] or object detection [12].…”

Section: Embedding Layermentioning

confidence: 99%

“…where MLP is a linear layer, ℎ 𝑖 ∈ R 𝑑 𝑡 and 𝑑 𝑡 is the output dimension of the Transformer encoder. Following Meng et al [19], we combine the loss of token classification task and glyph classification task as the final training objective. The training objective L is given as follows:…”

Section: Outputmentioning

confidence: 99%

See 1 more Smart Citation

General and Domain Adaptive Chinese Spelling Check with Error Consistent Pretraining

Liu¹,

Cao²,

Geng³

et al. 2022

Preprint

View full text Add to dashboard Cite

The lack of label data is one of the significant bottlenecks for Chinese Spelling Check (CSC). Existing researches use the method of automatic generation by exploiting unlabeled data to expand the supervised corpus. However, there is a big gap between the real input scenario and automatic generated corpus. Thus, we develop a competitive general speller ECSpell which adopts the Error Consistent masking strategy to create data for pretraining. This error consistency masking strategy is used to specify the error types of automatically generated sentences which is consistent with real scene. The experimental result indicates our model outperforms previous state-of-the-art models on the general benchmark.Moreover, spellers often work within a particular domain in real life. Due to lots of uncommon domain terms, experiments on our built domain specific datasets show that general models perform terribly. Inspired by the common practice of input methods, we propose to add an alterable user dictionary to handle the zero-shot domain adaption problem. Specifically, we attach a User Dictionary guided inference module (UD) to a general token classification based speller. Our experiments demonstrate that ECSpell 𝑈 𝐷 , namely ECSpell combined with UD, surpasses all the other baselines largely, even approaching the performance on the general benchmark 1 . CCS Concepts: • Computing methodologies → Natural language processing.

show abstract

Glyce: Glyph-vectors for Chinese Character Representations

Cited by 12 publications

References 53 publications

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

AIP: A Named Entity Recognition Method Combining Glyphs and Sounds

General and Domain Adaptive Chinese Spelling Check with Error Consistent Pretraining

Contact Info

Product

Resources

About