“…Recently, pre-trained language models (Peters et al, 2018;Devlin et al, 2019;Brown et al, 2020) have achieved promising performance on many NLP tasks. Apart from utilizing the universal representations from pre-trained models in downstream tasks, some literatures have shown the potential of pretrained masked language models (e.g., BERT (Devlin et al, 2019) and RoBERTa (Liu et al, 2019b)) to be factual knowledge bases (Petroni et al, 2019;Bouraoui et al, 2020;Jiang et al, 2020b;Shin et al, 2020;Jiang et al, 2020a;Wang et al, 2020;Kassner and Schütze, 2020a;Kassner et al, 2020). For example, to extract the birthplace of Steve Jobs, we can query MLMs like BERT with "Steve Jobs was born in [MASK]", where Steve Jobs is the subject of the fact, "was born in" is a prompt string for the relation "place-of-birth" and [MASK] is a placeholder for the object to predict.…”