“…In the first research line, early methods usually design various features tailored for causal expressions, such as lexical and syntactic patterns Girju, 2013, 2014a,b), causality cues or markers (Riaz and Girju, 2010;Do et al, 2011;Hidey and McKeown, 2016), statistical information (Beamer and Girju, 2009;Hashimoto et al, 2014), and temporal patterns (Riaz and Girju, 2014a;Ning et al, 2018). Then, researchers resort to a large amount of labeled data to mitigate the efforts of feature engineering and to learn diverse causal expressions (Hu et al, 2017;Hashimoto, 2019). To alleviate the annotation cost, recent methods leverage Pre-trained Language Models (PLMs, e.g., BERT (Devlin et al, 2019)) for the ECI task and have achieved SOTA performance (Kadowaki et al, 2019;Zuo et al, 2020).…”