“…Unsupervised methods typically design features based on the assumption that segments in the same topic are more coherent than those that belong to different topics, such as lexical cohesion (Hearst, 1997;Choi, 2000;Riedl and Biemann, 2012b), topic models (Misra et al, 2009;Riedl and Biemann, 2012a;Jameel and Lam, 2013;Du et al, 2013) and semantic embedding (Glavaš et al, 2016;Solbiati et al, 2021;Xing and Carenini, 2021). In contrast, supervised models can achieve more precise predictions by automatically mining clues of topic shift from large amounts of labeled data, either by classification on the pairs of sentences or chunks (Wang et al, 2017;Lukasik et al, 2020) or sequence labeling on the whole input sequence (Koshorek et al, 2018;Badjatiya et al, 2018;Xing et al, 2020;. However, the memory consumption and efficiency of neural models such as BERT (Kenton and Toutanova, 2019) can be limiting factors for modeling long documents as their length increases.…”