“…An attention mechanism adaptively weighs the keys of different key-value pairs based on their relative importance to a given query to predict the most suitable responses to the query [45]. Depending on the data paradigm of the key, the value, and the query, attention mechanisms are used in a wide variety of tasks, including tasks in natural language understanding [9], text-based image and video retrieval [4], object and action recognition in images and videos [46,40], and visual question answering [57]. In the case of userspecific highlight detection, the key, value, and query need to be based on the video contents, i.e., follow the paradigm of content-based highlight detection [42,37,2] to perform meaningful retrieval of the highlightable clips per user.…”