Using Chinese Glyphs for Named Entity Recognition

Sehanobish, Arijit; Song, Chan Hee

doi:10.48550/arxiv.1909.09922

Cited by 1 publication

(1 citation statement)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Motivated by Meng et al (2019) and Sehanobish and Song (2019)'s exploration on using glyph images for Chinese named entity recognition (NER) and Chinese word segmentation (CWS), we employ a glyph feature extractor to extract glyph features for Chinese characters. We make use of 8106 Chinese glyph images released by (Sehanobish and Song, 2019). To take advantage of powerful pre-trained models and avoid training from scratch, VGG19 (Simonyan and Zisserman, 2014) pretrained on ImageNet is adopted as the backbone of the glyph feature extractor.…”

Section: Glyph Feature Extractormentioning

confidence: 99%

PHMOSpell: Phonological and Morphological Knowledge Guided Chinese Spelling Check

Huang¹,

Li²,

Jiang³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Chinese Spelling Check (CSC) is a challenging task due to the complex characteristics of Chinese characters. Statistics reveal that most Chinese spelling errors belong to phonological or visual errors. However, previous methods rarely utilize phonological and morphological knowledge of Chinese characters or heavily rely on external resources to model their similarities. To address the above issues, we propose a novel end-to-end trainable model called PHMOSpell, which promotes the performance of CSC with multi-modal information. Specifically, we derive pinyin and glyph representations for Chinese characters from audio and visual modalities respectively, which are integrated into a pre-trained language model by a well-designed adaptive gating mechanism. To verify its effectiveness, we conduct comprehensive experiments and ablation tests. Experimental results on three shared benchmarks demonstrate that our model consistently outperforms previous state-of-the-art models.1 pinyin is the official phonetic system of Mandarin Chinese, which usually consists of three parts: initials, finals and tones.2 radical is the basic building blocks of all Chinese charac-

show abstract

Section: Glyph Feature Extractormentioning

confidence: 99%