“…Video/text summarization Existing models are either supervised or unsupervised. Unsupervised summarization models in video [9,31,40,41,43,50,52,54,55] and text [10,26,25,3] domains aim to identify a small subset of key units (video-segments/sentences) that preserve the global content of the input, e.g., using criteria like diversity and representativeness. In contrast, supervised video [13,15,16,42,49,51] and text [37,6,28,30,46] summarization methods solve the same problem by employing ground-truth summaries as training targets.…”