2022 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE) 2022
DOI: 10.23919/date54114.2022.9774761
|View full text |Cite
|
Sign up to set email alerts
|

Reconciling QoS and Concurrency in NVIDIA GPUs via Warp-Level Scheduling

Abstract: The widespread deployment of NVIDIA GPUs in latency-sensitive systems today requires predictable GPU multi-tasking, which cannot be trivially achieved. The NVIDIA CUDA API allows programmers to easily exploit the processing power provided by these massively parallel accelerators and is one of the major reasons behind their ubiquity. However, NVIDIA GPUs and the CUDA programming model favor throughput instead of latency and timing predictability. Hence, providing real-time and quality-of-service (QoS) propertie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 17 publications
0
1
0
Order By: Relevance
“…Their timing impact is taken into account in order to provide timing guarantees through schedulability analysis [10], [33], [47]. Other approaches targeting the automotive domain use diverse redundancy in the form of dual-lockstep execution potentially combined with check-pointing [18] or exploiting the intrinsic redundancy available in hardware platforms [3], [45]. Other real-time solutions based on hardware redundancy focus on faults in memories, to maintain the initial timing characteristics of hardware, despite the presence of faults, e.g., by using cache redundant entries [1].…”
Section: Related Workmentioning
confidence: 99%
“…Their timing impact is taken into account in order to provide timing guarantees through schedulability analysis [10], [33], [47]. Other approaches targeting the automotive domain use diverse redundancy in the form of dual-lockstep execution potentially combined with check-pointing [18] or exploiting the intrinsic redundancy available in hardware platforms [3], [45]. Other real-time solutions based on hardware redundancy focus on faults in memories, to maintain the initial timing characteristics of hardware, despite the presence of faults, e.g., by using cache redundant entries [1].…”
Section: Related Workmentioning
confidence: 99%