“…26,28,29 BERT has been successfully used in NLP for learning word vectors based on contextual information, in particular, for text classification and next-sentence prediction. [30][31][32][33][34] Unlike other language models that capture context unidirectionally, BERT was designed as a bidirectional model to analyze sentences in forward and backward direction and predict new words conditioned on all other words in sentences. 26,35 Given that next-sentence prediction was conceptually related to the AS extension task, BERT was chosen as a transformer architecture for R-group prediction.…”