“…Rather than learning representations of the question and answer candidate separately, researchers have recently introduced various types of attention mechanisms to the answer selection task (Bian, Li, Yang, Chen, & Lin, 2017; Deng et al, 2019; Kim, Kang, & Kwak, 2019; Shen et al, 2018; Tay, Tuan, & Hui, 2018b) that better focus on the relevant parts of the input QA pairs. To enrich the representation of features, researchers have integrated knowledge bases into neural networks and captured more relevant information to improve performance (Deng et al, 2018; Guo et al, 2017; Shijia, Xu, & Xiang, 2018; F. Wang, Wu, Li, & Zhou, 2017; J. Wang, Wang, Zhang, & Yan, 2017; Zhu, Cheng, & Su, 2020). Recently, a new paradigm is to obtain better performance by using huge pre‐trained models (e.g., ELMo, BERT) (Li, Yu, Chen, & Li, 2019; Mozafari, Fatemi, & Nematbakhsh, 2019).…”