Google BERT is a neural network that is good at natural language processing. It has two major strategies. One is “Masked language Model” to clear the word-level relationships, and the other is “Next Sentence Prediction” to clear sentence-level relationships. In the masked language model, with the task of masking some words in sentences, BERT learns to predict the original word from context. Some questions come to mind. Why BERT achieves effective learning by reading in two ways from fore and back? What is the difference between bidirectional reading? BERT learns to predict the original word using the surrounding words as context and to make two-way predictions by forward and backward readings in order to increase the precision. Besides, the bidirectional reading technique can be applied to scenario planning especially using back-casting from the future. This paper clarifies these mechanisms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.