“…With the advancements of neural methods in recent years, claims of fluency in summarization (Liu et al, 2017;Celikyilmaz et al, 2018), language modeling (Radford et al, 2019;Brown et al, 2020), response generation (Zhang et al, 2020;Hosseini-Asl et al, 2020) and human parity in machine translation (Hassan et al, 2018) have led to calls for finer-grained discourse-level evaluations (Läubli et al, 2018;Sharma et al, 2019;Popel et al, 2020), since traditional metrics such as BLEU and ROUGE are unable to measure text quality and readability (Paulus et al, 2018;Reiter, 2018). Coherence models that can evaluate machine-generated text have become the need of the hour.…”