“…We conducted experiments on six datasets: SNLI (Bowman et al, 2015), HELP (Yanaka et al, 2019b), MED (Yanaka et al, 2019a), MoNLI (Geiger et al, 2020), NatLog-2hop (Feng et al, 2020), and a compositional generalization dataset (Yanaka et al, 2020). The results show the model's superior capability in monotonicity inferences, systematic generalization, and interpretability, compared to previous models on these existing datasets, while the model remains a competitive performance on the generic SNLI test set.…”