Whole word masking (WWM), which masks all subwords corresponding to a word at once, makes a better English BERT model (Sennrich et al., 2016). For the Chinese language, however, there is no subword because each token is an atomic character. The meaning of a word in Chinese is different in that a word is a compositional unit consisting of multiple characters. Such difference motivates us to investigate whether WWM leads to better context understanding ability for Chinese BERT. To achieve this, we introduce two probing tasks related to grammatical error correction and ask pretrained models to revise or insert tokens in a masked language modeling manner. We construct a dataset including labels for 19,075 tokens in 10,448 sentences. We train three Chinese BERT models with standard characterlevel masking (CLM), WWM, and a combination of CLM and WWM, respectively. Our major findings are as follows: First, when one character needs to be inserted or replaced, the model trained with CLM performs the best. Second, when more than one character needs to be handled, WWM is the key to better performance. Finally, when being fine-tuned on sentence-level downstream tasks, models trained with different masking strategies perform comparably. * Work done during internship at Tencent AI Lab. * indicates equal contributions.† Corresponding author. 1 Next sentence prediction is the other pretraining task adopted in the original BERT paper. However, it is removed in some following works like RoBERTa . We do not consider the next sentence prediction in this work.