ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413394
|View full text |Cite
|
Sign up to set email alerts
|

Audiovisual Highlight Detection in Videos

Abstract: In this paper, we test the hypothesis that interesting events in unstructured videos are inherently audiovisual. We combine deep image representations for object recognition and scene understanding with representations from an audiovisual affect recognition model. To this set, we include content agnostic audio-visual synchrony representations and mel-frequency cepstral coefficients to capture other intrinsic properties of audio. These features are used in a modular supervised model. We present results from two… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(1 citation statement)
references
References 22 publications
0
1
0
Order By: Relevance
“…We use the 2048 dimension scene ResNet50 representations pre-trained 3 on the Places365 dataset [4]. Previous studies have shown the benefits of these representations [13,25]. These Places embeddings serve as input for subsequent the training process (keeping Module A frozen).…”
Section: Methodsmentioning
confidence: 99%
“…We use the 2048 dimension scene ResNet50 representations pre-trained 3 on the Places365 dataset [4]. Previous studies have shown the benefits of these representations [13,25]. These Places embeddings serve as input for subsequent the training process (keeping Module A frozen).…”
Section: Methodsmentioning
confidence: 99%