Proceedings of the 2020 International Conference on Multimedia Retrieval 2020
DOI: 10.1145/3372278.3390742
|View full text |Cite
|
Sign up to set email alerts
|

HLVU: A New Challenge to Test Deep Understanding of Movies the Way Humans do

Abstract: In this paper we propose a new evaluation challenge and direction in the area of High-level Video Understanding. The challenge we are proposing is designed to test automatic video analysis and understanding, and how accurately systems can comprehend a movie in terms of actors, entities, events and their relationship to each other. A pilot High-Level Video Understanding (HLVU) dataset of open source movies were collected for human assessors to build a knowledge graph representing each of them. A set of queries … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(9 citation statements)
references
References 8 publications
0
9
0
Order By: Relevance
“…The HLVU dataset (Table 1) has 10 open source movies sampled from paper [4]. The training set includes four long and two short movies, while testing set includes two long and two short movies.…”
Section: Datasetmentioning
confidence: 99%
See 1 more Smart Citation
“…The HLVU dataset (Table 1) has 10 open source movies sampled from paper [4]. The training set includes four long and two short movies, while testing set includes two long and two short movies.…”
Section: Datasetmentioning
confidence: 99%
“…The entities provided in the HLVU dataset [4] include person, objects, locations and concepts for which relevant images are provided for mapping. The locations and object entities are localized within scenes using SIFT based feature matching to handle varying scales and crops.…”
Section: Object Detection and Mappingmentioning
confidence: 99%
“…The High-Level Video Understanding (HLVU) dataset [4] includes 10 movies that are suitable for researching the relationship between Figure 1: Architecture of the multi-modal fusion model entities. The HLVU dataset meets the important requirements for selecting movies such as the duration of the movies (different lengths of movies: 6 long movies and 4 short movies), the quality of the video, and the clarity of the storyline.…”
Section: Datasetmentioning
confidence: 99%
“…The challenge uses the recently introduced High Level Video Understanding (HLVU) dataset [5] which consists of 10 movies released under creative commons licenses with a total combined duration of 681 minutes. For each of the 10 videos, the dataset also provides cropped key-frames, showing either characters or locations which are relevant for the story told by the movies.…”
Section: Provided Datamentioning
confidence: 99%
“…mechanism. 5 Detected faces are then compared against the provided example images in order to identify the visible person. Due to changes in size and orientation of the people on screen, as well as variations of overall image quality throughout the videos, we apply the face detection and identification method densely, i.e., on every frame of the videos.…”
Section: Data Pre-processingmentioning
confidence: 99%