Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1273
|View full text |Cite
|
Sign up to set email alerts
|

A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check

Abstract: Chinese spelling check (CSC) is a challenging yet meaningful task, which not only serves as a preprocessing in many natural language processing (NLP) applications, but also facilitates reading and understanding of running texts in peoples' daily lives. However, to utilize datadriven approaches for CSC, there is one major limitation that annotated corpora are not enough in applying algorithms and building models. In this paper, we propose a novel approach of constructing CSC corpus with automatically generated … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
87
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 93 publications
(92 citation statements)
references
References 26 publications
0
87
0
Order By: Relevance
“…As shown in Table 6, FASPell achieves state-ofthe-art F1 performance on both detection level and correction level. It is better in precision than the model by Wang et al (2018) and better in recall than the model by Zhang et al (2015). In comparison with Zhao et al (2017), It is better by any metric.…”
Section: Performancementioning
confidence: 85%
See 2 more Smart Citations
“…As shown in Table 6, FASPell achieves state-ofthe-art F1 performance on both detection level and correction level. It is better in precision than the model by Wang et al (2018) and better in recall than the model by Zhang et al (2015). In comparison with Zhao et al (2017), It is better by any metric.…”
Section: Performancementioning
confidence: 85%
“…2. insufficiency in utilizing character similarity. Since a cut-off threshold of quantified character similarity (Liu et al, 2010;Wang et al, 2018) is used to produce the confusion set, similar characters are actually treated indiscriminately in terms of their similarity. This means the information of character similarity is not sufficiently utilized.…”
Section: Related Work and Bottlenecksmentioning
confidence: 99%
See 1 more Smart Citation
“…Hsieh et al (2015) propose to extract spelling error samples from the Google web 1T corpus. Wang et al (2018) propose the OCR-based and ASR-based methods to mimic human errors. They further proposed a pointer network to model the CSC task under the framework of a seq2seq model .…”
Section: Related Workmentioning
confidence: 99%
“…The results with ‡ are reproduced by rerunning the released code and evaluation scripts on the standard CSC datasets. TheWang et al (2018) and calculate the performance on the character-level, which makes their results incomparable with other works.…”
mentioning
confidence: 92%