ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9415048
|View full text |Cite
|
Sign up to set email alerts
|

Saliency-Driven Versatile Video Coding for Neural Object Detection

Abstract: Saliency-driven image and video coding for humans has gained importance in the recent past. In this paper, we propose such a saliency-driven coding framework for the video coding for machines task using the latest video coding standard Versatile Video Coding (VVC). To determine the salient regions before encoding, we employ the real-time-capable object detection network You Only Look Once (YOLO) in combination with a novel decision criterion. To measure the coding quality for a machine, the state-of-the-art ob… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 25 publications
(14 citation statements)
references
References 16 publications
0
9
0
Order By: Relevance
“…Saliency-driven image and video coding for humans has gained importance in the recent past. In [61] authors propose a saliency-driven coding framework for the video coding for machines task using the latest video coding standard Versatile Video Coding (VVC). To determine the salient regions before encoding, we employ the real-time-capable object detector YOLO in combination with a novel decision criterion.…”
Section: Video Coding Scheme For Specific Computer Vision Tasksmentioning
confidence: 99%
“…Saliency-driven image and video coding for humans has gained importance in the recent past. In [61] authors propose a saliency-driven coding framework for the video coding for machines task using the latest video coding standard Versatile Video Coding (VVC). To determine the salient regions before encoding, we employ the real-time-capable object detector YOLO in combination with a novel decision criterion.…”
Section: Video Coding Scheme For Specific Computer Vision Tasksmentioning
confidence: 99%
“…Regarding (V) the number of considered distortions, existing studies consider at most 50 coding configurations, at the exception of Dejean-servieres et al [10]. This is because some studies only compare themselves to HEVC test Model (HM) or VVC Test Model (VTM) with few Quantization Parameter (QP) after proposing a new method to reach better tradeoffs between vision task performance and bitrate [14]- [17], [19], [25], [26]. Some papers also evaluate DNN resilience to JPEG/JPEG2000 compression [10], [12], [15], [18], [21], [24], [27], [28], [30], AVC [1], [31] or auto-encoders [18], [26], [27], but no paper consider all mentioned image and video codec generations in a unified framework (II).…”
Section: Related Workmentioning
confidence: 99%
“…Early work in this area optimized the compression to preserve extracted features [2,3,4]. More recent work focuses on coding for M2M communication, where a neural network analyzes the decoded image instead of a human observer [5,6,7,8,9,10]. This topic is commonly referred to as video coding for machines (VCM).…”
Section: Introductionmentioning
confidence: 99%
“…To the best of our knowledge, there is currently no dataset available that fulfills all three conditions. Thus, previous VCM research either sacrifices the condition of uncompressed input data [5,6] or evaluates the coding frameworks on single images [7,8,9,10].…”
Section: Introductionmentioning
confidence: 99%