Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Syst 2019
DOI: 10.1145/3297858.3304043
|View full text |Cite
|
Sign up to set email alerts
|

A Formal Analysis of the NVIDIA PTX Memory Consistency Model

Abstract: This paper presents the first formal analysis of the official memory consistency model for the NVIDIA PTX virtual ISA. Like other GPU memory models, the PTX memory model is weakly ordered but provides scoped synchronization primitives that enable GPU program threads to communicate through memory. However, unlike some competing GPU memory models, PTX does not require data race freedom, and this results in PTX using a fundamentally different (and more complicated) set of rules in its memory model. As such, PTX h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
13
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 31 publications
(14 citation statements)
references
References 33 publications
1
13
0
Order By: Relevance
“…Specifically, if there are two or more stores to overlapping locations from a lockstep execution, a determination as to which one was the last store to that location is not possible and hence the value of the subsequent load to that location is undefined. This is similar to the reasoning provided in prior works where the outcome of racey accesses that "occur at the same time" are undefined [5,23,36,46,55]. As a result, LSC does not impose any restriction on the value of overlapping stores from a lockstep execution.…”
Section: Hardware Design Implicationssupporting
confidence: 64%
See 1 more Smart Citation
“…Specifically, if there are two or more stores to overlapping locations from a lockstep execution, a determination as to which one was the last store to that location is not possible and hence the value of the subsequent load to that location is undefined. This is similar to the reasoning provided in prior works where the outcome of racey accesses that "occur at the same time" are undefined [5,23,36,46,55]. As a result, LSC does not impose any restriction on the value of overlapping stores from a lockstep execution.…”
Section: Hardware Design Implicationssupporting
confidence: 64%
“…HRF defines scopes in terms of the execution hierarchy of GPUs. For example, work-items within the same work-group (threadblock) synchronize through work-group scope, and work-items from different work-groups synchronize through device scope (scopes are present in other models as well [54]). While use of such scopes are well defined for synchronizing between work-items of a GPU, the synchronization between work-items and threads running on other processing elements on the same GPU is not clearly defined.…”
Section: Limitations Of Hrfmentioning
confidence: 99%
“…Memory Modelling. CPU memory models such as x86 [Owens et al 2009], POWER , Arm [Pulte et al 2017], and RISC-V [Pulte et al 2019] are now fairly well understood, as are some GPU memory models [Alglave et al 2015;Lustig et al 2019]. However, these models do not apply to systems where threads are on different devices.…”
Section: Further Related Workmentioning
confidence: 99%
“…Modelling the concurrency aspects of the Armv8 architecture entails developing a consistency model for Armv8. Consistency models determine what values a read can take; weak consistency models such as the ones of Arm [4,23], IBM [37,38], Intel [39,40], Nvidia [10,33], RISC-V [3], C++ [20,31], Linux [15], and others allow more behaviours than Sequential Consistency (SC) [32].…”
Section: Design Principles and Rationalementioning
confidence: 99%