Reconciling QoS and Concurrency in NVIDIA GPUs via Warp-Level Scheduling

Singh, Jayati; Olmedo, Ignacio Sanudo; Capodieci, Nicola; Marongiu, Andrea; Caccamo, Marco

doi:10.23919/date54114.2022.9774761

Cited by 3 publications

(1 citation statement)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Their timing impact is taken into account in order to provide timing guarantees through schedulability analysis [10], [33], [47]. Other approaches targeting the automotive domain use diverse redundancy in the form of dual-lockstep execution potentially combined with check-pointing [18] or exploiting the intrinsic redundancy available in hardware platforms [3], [45]. Other real-time solutions based on hardware redundancy focus on faults in memories, to maintain the initial timing characteristics of hardware, despite the presence of faults, e.g., by using cache redundant entries [1].…”

Section: Related Workmentioning

confidence: 99%

Functional and Timing Implications of Transient Faults in Critical Systems

Kritikakou

Nikolaou

Rodriguez-Ferrandez

et al. 2022

2022 IEEE 28th International Symposium on on-Line Testing and Robust System Design (IOLTS)

View full text Add to dashboard Cite

Embedded systems in critical domains, such as automotive, aviation, space domains, are often required to guarantee both functional and temporal correctness. Considering transient faults, fault analysis and mitigation approaches are implemented at various levels of the system design, in order to maintain the functional correctness. However, transient faults and their mitigation methods have a timing impact, which can affect the temporal correctness of the system. In this work, we expose the functional and the timing implications of transient faults for critical systems. More precisely, we initially highlight the timing effect of transient faults occurring in the combinational and sequential logic of a processor. Furthermore, we propose a full stack vulnerability analysis that drives the design of selective hardware-based mitigation for real-time applications. Last, we study the timing impact of software-based reliability mitigation methods applied in a COTS GPU, using a fault tolerant middleware.

show abstract

Section: Related Workmentioning

confidence: 99%

Functional and Timing Implications of Transient Faults in Critical Systems

Kritikakou

Nikolaou

Rodriguez-Ferrandez

et al. 2022

2022 IEEE 28th International Symposium on on-Line Testing and Robust System Design (IOLTS)

View full text Add to dashboard Cite

show abstract

Memento: An Adaptive, Compiler-Assisted Register File Cache for GPUs

Shoushtary,

Arnau,

Murgadas

et al. 2024

2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)

View full text Add to dashboard Cite

Uso de GPUs en aplicaciones de tiempo real: Una revisión de técnicas para el análisis y optimización de parámetros temporales

Gomez,

Díaz de Cerio,

Parra

et al. 2023

Rev. iberoam. autom. inform. ind.

View full text Add to dashboard Cite

La conducción autónoma despierta un interés cada vez mayor en la industria, no solo en el sector de la automoción, sino también en el transporte de personas o mercancías por carretera o ferrocarril y en entornos de fabricación más controlados. Los sistemas ciber-físicos que se están proponiendo para este tipo de aplicaciones requieren de una gran capacidad de cómputo (arquitecturas hardware con varios núcleos, GPUs, NPUs…) para poder atender y reaccionar a una múltiple y compleja cantidad de sensores (cámaras, radar, LiDAR, medida de distancia, etc.). Por otro lado, este tipo de sistemas debe atender a requisitos de seguridad funcional y también de tiempo real. Este último aspecto plantea retos en los que se está trabajando intensamente y en los que aún quedan muchas cuestiones por resolver. En este trabajo, se hace una revisión de la literatura más reciente del uso de arquitecturas heterogéneas con GPUs en aplicaciones de tiempo real. Estos trabajos proponen soluciones para la estimación de cotas de tiempos de ejecución y respuesta temporal, proponiendo diferentes estrategias de optimización destacando la mitigación de interferencia en la memoria.

show abstract

Reconciling QoS and Concurrency in NVIDIA GPUs via Warp-Level Scheduling

Cited by 3 publications

References 17 publications

Functional and Timing Implications of Transient Faults in Critical Systems

Functional and Timing Implications of Transient Faults in Critical Systems

Memento: An Adaptive, Compiler-Assisted Register File Cache for GPUs

Uso de GPUs en aplicaciones de tiempo real: Una revisión de técnicas para el análisis y optimización de parámetros temporales

Contact Info

Product

Resources

About