UMRSpell: Unifying the Detection and Correction Parts of Pre-trained Models towards Chinese Missing, Redundant, and Spelling Correction

He, Zheyu; Zhu, Yong; Wang, Linlin; Xu, Lu

doi:10.18653/v1/2023.acl-long.570

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article2

Other1

Relationship

Self Cite0

Independent3

Authors

Journals

Cited by 3 publications

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

PPTopicPLM: plug-and-play topic-enhanced pre-trained language model for short-text rumor detection

Zeng,

Li,

2024

J Supercomput

View full text Add to dashboard Cite

PPTopicPLM: plug-and-play topic-enhanced pre-trained language model for short-text rumor detection

Zeng,

Li,

2024

J Supercomput

View full text Add to dashboard Cite

Span Confusion is All You Need for Chinese Spelling Correction

Ye,

Jia,

Tian

et al. 2024

Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

Self-Distillation and Pinyin Character Prediction for Chinese Spelling Correction Based on Multimodality

He,

Liu,

Liu

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

Chinese spelling correction (CSC) constitutes a pivotal and enduring goal in natural language processing, serving as a foundational element for various language-related tasks by detecting and rectifying spelling errors in textual content. Numerous methods for Chinese spelling correction leverage multimodal information, including character, character sound, and character shape, to establish connections between incorrect and correct characters. Research indicates that a majority of spelling errors stem from pinyin similarity, with character similarity accounting for half of the errors. Consequently, effectively modeling character pinyin and character relationships emerges as a key challenge in the CSC task. In this study, we propose enhancing the CSC task by introducing the pinyin character prediction task. We employ an adaptive weighting method in the pinyin character prediction task to address predictions in a more granular manner, achieving a balance between the two prediction tasks. The proposed model, SPMSpell, utilizes ChineseBERT as an encoder to capture multimodal feature information simultaneously. It incorporates three parallel decoders for character prediction, pinyin prediction, and self-distillation modules. To mitigate potential overfitting concerning pinyin, a self-distillation method is introduced to prioritize character information in predictions. Extensive experiments conducted on three SIGHAN benchmark tests showcase that the model introduced in this paper attains a superior level of performance. This substantiates the correctness and superiority of the adaptive weighted pinyin character prediction task and underscores the effectiveness of the self-distillation module.

show abstract

UMRSpell: Unifying the Detection and Correction Parts of Pre-trained Models towards Chinese Missing, Redundant, and Spelling Correction

Cited by 3 publications

References 0 publications

PPTopicPLM: plug-and-play topic-enhanced pre-trained language model for short-text rumor detection

PPTopicPLM: plug-and-play topic-enhanced pre-trained language model for short-text rumor detection

Span Confusion is All You Need for Chinese Spelling Correction

Self-Distillation and Pinyin Character Prediction for Chinese Spelling Correction Based on Multimodality

Contact Info

Product

Resources

About