Attention Is All You Need for Chinese Word Segmentation

Duan, Sufeng; Zhao, Hai

doi:10.18653/v1/2020.emnlp-main.317

Cited by 24 publications

(21 citation statements)

References 32 publications

(63 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Aiming at not only keeping competitive performance on benchmarks but also reducing the complexity of the CWS methods, our proposed framework consists of two essential modules: a student model and a teacher model, as shown in Figure 1. There is an obvious performance gap between the model based on PLMs and the lightweight model (Duan and Zhao, 2020). The OOV issue is the main reason for the gap.…”

Section: Proposed Frameworkmentioning

confidence: 99%

“…-Transformer. This paper adopts a modified Transformer which follows the previous study by Duan and Zhao (2020). The modified Transformer changes the multi-head self-attention to the multihead Gaussian directional attention.…”

Section: Appendix a Model Architecturementioning

confidence: 99%

“…In particular, Long Short-Term Memory Networks (LSTM) are the main backbone networks being used in these methods Ma et al, 2018;. Except for using LSTM, self-attention networks have been also employed for CWS (Duan and Zhao, 2020).…”

Section: Related Workmentioning

confidence: 99%

“…The Transformer is usually not working as well as LSTM for sequence labeling tasks despite its success on other tasks. We propose a new Transformer variant that is inspired by Duan and Zhao (2020). The modified Transformer utilizes the Gaussian directional mask to encode unigram features.…”

Section: Student Modelmentioning

confidence: 99%

“…We utilize the uniform-sample to choose the hyper-parameters. In particular, we use the hyper-parameters of the modified Transformer model and the NMT model following previous studies (Vaswani et al, 2017;Duan and Zhao, 2020).…”

Section: B Hyper-parameter Settingmentioning

confidence: 99%

See 4 more Smart Citations

Enhancing Chinese Word Segmentation via Pseudo Labels for Practicability

Huang¹,

Liu²,

Huang³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Pre-trained language models (e.g., BERT) significantly alleviate two traditional challenging problems for Chinese word segmentation (CWS): segmentation ambiguity and out-ofvocabulary (OOV) words. However, such improvements are usually achieved on traditional benchmark datasets and not close to an important goal of CWS: practicability (i.e., low complexity as a standalone task and high beneficiality to downstream tasks). To make a trade-off between traditional evaluation and practicability for CWS, we propose a semisupervised neural method via pseudo labels. The neural method consists of a teacher model and a student model, which distills knowledge from unlabeled data to the student model so as to improve both in-domain and out-ofdomain CWS. Experiments show that our proposed method can not only keep the practicability of the lightweight student model but also improve the performance of segmentation effectively. We also evaluate a range of heterogeneous neural architectures of CWS on downstream Chinese NLP tasks. Results of further experiments demonstrate that our proposed segmenter is reliable and practical as a pre-processing step of the downstream NLP tasks at the minimum cost. 1

show abstract

Section: Proposed Frameworkmentioning

confidence: 99%

Section: Appendix a Model Architecturementioning

confidence: 99%