2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00778
|View full text |Cite
|
Sign up to set email alerts
|

Rethinking the Evaluation of Video Summaries

Abstract: Video summarization is a technique to create a short skim of the original video while preserving the main stories/content. There exists a substantial interest in automatizing this process due to the rapid growth of the available material. The recent progress has been facilitated by public benchmark datasets, which enable easy and fair comparison of methods. Currently the established evaluation protocol is to compare the generated summary with respect to a set of reference summaries provided by the dataset. In … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
90
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 115 publications
(92 citation statements)
references
References 24 publications
2
90
0
Order By: Relevance
“…Our experiments on two benchmark datasets showed that our hierarchical structure achieved the best performance out of all other methods known to us. In particular, evaluation using rank order statistics that was recently proposed in [13] clearly showed the superiority of our proposed method. Also, our proposal only requires a smaller number of task-level annotations to train the Manager.…”
Section: Resultsmentioning
confidence: 64%
See 1 more Smart Citation
“…Our experiments on two benchmark datasets showed that our hierarchical structure achieved the best performance out of all other methods known to us. In particular, evaluation using rank order statistics that was recently proposed in [13] clearly showed the superiority of our proposed method. Also, our proposal only requires a smaller number of task-level annotations to train the Manager.…”
Section: Resultsmentioning
confidence: 64%
“…However, our proposal was not capable of the transfer task because subgoals between different domains may vary a lot. Then, we evaluate the performance using the rank order statistics proposed in [13] introduced in Sec. 4.2, which is claimed to be a better evaluation metric without the effect of post-processing.…”
Section: Discussionmentioning
confidence: 99%
“…A related problem is the fact that current supervised techniques are trained using a 'combined' ground truth summary, either in form of combined scores from multiple ground truth summaries or scores [4,12,37] or in form a set of ground truth selections, as in dP-PLSTM [37]. However, since there can be multiple correct answers, a reason for low consistency between user summaries [13,23] combining them into one misses out on the separate flavors captured by each of them. Combining many into one set of scores also runs the risk of giving more emphasis to 'importance' over and above other desirable characteristics of a summary like continuity, diversity etc.…”
Section: Introductionmentioning
confidence: 99%
“…Evaluation: With a desire to be comparable across techniques, almost all recent work evaluates their results using F1 score [4,12,39]. This approach of assessing a candidate summary vis-avis a ground truth summary sounds good, but it has following limitations: 1) The user summaries are themselves inconsistent with each other, as already noted above [13,23]. As a workaround, the assessment is done with respect to the nearest neighbor [10,30].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation