“…Early studies (Pitler et al, 2008;Lin et al, 2009Lin et al, , 2014 focused on extracting linguistic and semantic features from two discourse units. Recent research (Zhang et al, 2015;Ji and Eisenstein, 2015;Ji et al, 2016) tried to model compositional meanings of two discourse units by exploiting interactions between words in two units with more and more complicated neural network models, including the ones using neural tensor (Chen et al, 2016;Qin et al, 2016;Lei et al, 2017) and attention mechanisms Lan et al, 2017;). Another trend is to alleviate the shortage of annotated data by leveraging related external data, such as explicit discourse relations in PDTB Lan et al, 2017;Qin et al, 2017) and unlabeled data obtained elsewhere Lan et al, 2017), often in a multi-task joint learning framework.…”