Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.317
|View full text |Cite
|
Sign up to set email alerts
|

Attention Is All You Need for Chinese Word Segmentation

Abstract: Taking greedy decoding algorithm as it should be, this work focuses on further strengthening the model itself for Chinese word segmentation (CWS), which results in an even more fast and more accurate CWS model. Our model consists of an attention only stacked encoder and a light enough decoder for the greedy segmentation plus two highway connections for smoother training, in which the encoder is composed of a newly proposed Transformer variant, Gaussian-masked Directional (GD) Transformer, and a biaffine attent… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
21
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
2
2

Relationship

0
6

Authors

Journals

citations
Cited by 24 publications
(21 citation statements)
references
References 32 publications
(63 reference statements)
0
21
0
Order By: Relevance
“…Aiming at not only keeping competitive performance on benchmarks but also reducing the complexity of the CWS methods, our proposed framework consists of two essential modules: a student model and a teacher model, as shown in Figure 1. There is an obvious performance gap between the model based on PLMs and the lightweight model (Duan and Zhao, 2020). The OOV issue is the main reason for the gap.…”
Section: Proposed Frameworkmentioning
confidence: 99%
See 4 more Smart Citations
“…Aiming at not only keeping competitive performance on benchmarks but also reducing the complexity of the CWS methods, our proposed framework consists of two essential modules: a student model and a teacher model, as shown in Figure 1. There is an obvious performance gap between the model based on PLMs and the lightweight model (Duan and Zhao, 2020). The OOV issue is the main reason for the gap.…”
Section: Proposed Frameworkmentioning
confidence: 99%
“…-Transformer. This paper adopts a modified Transformer which follows the previous study by Duan and Zhao (2020). The modified Transformer changes the multi-head self-attention to the multihead Gaussian directional attention.…”
Section: Appendix a Model Architecturementioning
confidence: 99%
See 3 more Smart Citations