Less Is More: Learning Highlight Detection From Video Duration

Xiong, Bo; Kalantidis, Yannis; Ghadiyaram, Deepti; Grauman, Kristen

doi:10.1109/cvpr.2019.00135

Cited by 87 publications

(68 citation statements)

References 34 publications

(117 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An AVAC could automatically keep a running tally of information the spectator may find interesting based on their reaction and the current state of play. To enhance the spectator experience, the AVAC may automatically generate highlight reels that effectively reflect the flow of the game or summarize the most exciting segments (Mahasseni, Lam, & Todorovic, 2017;Merler et al, 2018;Xiong, Kalantidis, Ghadiyaram, & Grauman, 2019;Yang et al, 2015;K. Zhang, Chao, Sha, & Grauman, 2016); moreover, the AVAC might suggest related games predicted to engage or interest the spectator.…”

Section: Crowd-sourced Datamentioning

confidence: 99%

Game Plan: What AI can do for Football, and What Football can do for AI

Tuyls

Omidshafiei²,

Müller³

et al. 2021

jair

View full text Add to dashboard Cite

The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with the goal of better addressing new scientific challenges involved in the analysis of both individual players’ and coordinated teams’ behaviors. The research challenges associated with predictive and prescriptive football analytics require new developments and progress at the intersection of statistical learning, game theory, and computer vision. In this paper, we provide an overarching perspective highlighting how the combination of these fields, in particular, forms a unique microcosm for AI research, while offering mutual benefits for professional teams, spectators, and broadcasters in the years to come. We illustrate that this duality makes football analytics a game changer of tremendous value, in terms of not only changing the game of football itself, but also in terms of what this domain can mean for the field of AI. We review the state-of-the-art and exemplify the types of analysis enabled by combining the aforementioned fields, including illustrative examples of counterfactual analysis using predictive models, and the combination of game-theoretic analysis of penalty kicks with statistical learning of player attributes. We conclude by highlighting envisioned downstream impacts, including possibilities for extensions to other sports (real and virtual).

show abstract

Section: Crowd-sourced Datamentioning

confidence: 99%

Game Plan: What AI can do for Football, and What Football can do for AI

Tuyls

Omidshafiei²,

Müller³

et al. 2021

jair

View full text Add to dashboard Cite

show abstract

“…As for domain-agnostic approach, Mendi et al propose motion strength (Mendi, Clemente, and Bayrak 2013) that operates uniformly on any video. Domain-specific approaches tailor highlights to the topic domain, and leverage video duration (Xiong et al 2019) and visual co-occurrence (Chu, Song, and Jaimes 2015) as the weak supervision signal, or leverage category-aware reconstruction loss (Yang et al 2015a). However, without humanguided signals, the results are not satisfying enough.…”

Section: Related Work Video Highlight Detectionmentioning

confidence: 99%

“…Video highlight detection algorithms are generally categorized as either unsupervised or supervised methods. Unsupervised techniques create video highlights by employing heuristics, such as video duration (Xiong et al 2019) and visual co-occurrence (Chu, Song, and Jaimes 2015), to achieve desired characteristics. Without human-guided signals, however, the results are not satisfying enough.…”

Section: Introductionmentioning

confidence: 99%

“…To learn video highlight in a supervised fashion, as shown in Figure 1(a), current state-of-the-art methods (Yao, Mei, and Rui 2016;Jiao et al 2018;Xiong et al 2019) mainly utilize a pair-wise ranking constraint for two video segments with a contrastive relationship. Although these methods achieve promising results, they suffer from two problems: (1) Most existing approaches only focus on learning holistic visual representations of video segments but ignore object semantics for inferring video highlights.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Find Objects and Focus on Highlights: Mining Object Semantics for Video Highlight Detection via Graph Neural Networks

Zhang

Gao

Yang

et al. 2020

AAAI

View full text Add to dashboard Cite

With the increasing prevalence of portable computing devices, browsing unedited videos is time-consuming and tedious. Video highlight detection has the potential to significantly ease this situation, which discoveries moments of user's major or special interest in a video. Existing methods suffer from two problems. Firstly, most existing approaches only focus on learning holistic visual representations of videos but ignore object semantics for inferring video highlights. Secondly, current state-of-the-art approaches often adopt the pairwise ranking-based strategy, which cannot enjoy the global information to infer highlights. Therefore, we propose a novel video highlight framework, named VH-GNN, to construct an object-aware graph and model the relationships between objects from a global view. To reduce computational cost, we decompose the whole graph into two types of graphs: a spatial graph to capture the complex interactions of object within each frame, and a temporal graph to obtain object-aware representation of each frame and capture the global information. In addition, we optimize the framework via a proposed multi-stage loss, where the first stage aims to determine the highlight-probability and the second stage leverage the relationships between frames and focus on hard examples from the former stage. Extensive experiments on two standard datasets strongly evidence that VH-GNN obtains significant performance compared with state-of-the-arts.

show abstract

“…As another example, when it comes to semantics at a higher level than what is captured by visual appearance, closeup of a player in soccer can be considered important if it is immediately followed by a goal, while not so important when it occurs elsewhere. These and other issues discussed below make it an interesting research problem with several papers pushing the state-of-the-art for newer algorithms and model architectures [6,10,19,32,34,[36][37][38]40] and datasets [8,26,29]. However, as noted by a few recent works several fundamental issues remain to be addressed.…”

Section: Introductionmentioning

confidence: 99%

Realistic Video Summarization through VISIOCITY

Kaushal

Kothawade²,

Iyer³

et al. 2020

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery

View full text Add to dashboard Cite

Automatic video summarization is still an unsolved problem due to several challenges. We take steps towards making it more realistic by addressing the following challenges. Firstly, the currently available datasets either have very short videos or have few long videos of only a particular type. We introduce a new benchmarking dataset called VISIOCITY which comprises of longer videos across six different categories with dense concept annotations capable of supporting different flavors of video summarization and other vision problems. Secondly, for long videos, human reference summaries, necessary for supervised video summarization techniques, are difficult to obtain. We present a novel recipe based on pareto optimality to automatically generate multiple reference summaries from indirect ground truth present in VISIOCITY. We show that these summaries are at par with human summaries. Thirdly, we demonstrate that in the presence of multiple ground truth summaries (due to the highly subjective nature of the task), learning from a single combined ground truth summary using a single loss function is not a good idea. We propose a simple recipe VISIOCITY-SUM to enhance an existing model using a combination of losses and demonstrate that it beats the current state of the art techniques. We also present a study of different desired characteristics of a good summary and demonstrate that a single measure (say F1) to evaluate a summary, as is the current typical practice, falls short in some ways. We propose an evaluation framework for better quantitative assessment of summary quality which is closer to human judgment than a single measure. We report the performance of a few representative techniques of video summarization on VISIOCITY assessed using various measures and bring out the limitation of the techniques and/or the assessment mechanism in modeling human judgment and demonstrate the effectiveness of our evaluation framework in doing so.

show abstract

Less Is More: Learning Highlight Detection From Video Duration

Cited by 87 publications

References 34 publications

Game Plan: What AI can do for Football, and What Football can do for AI

Game Plan: What AI can do for Football, and What Football can do for AI

Find Objects and Focus on Highlights: Mining Object Semantics for Video Highlight Detection via Graph Neural Networks

Realistic Video Summarization through VISIOCITY

Contact Info

Product

Resources

About