2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems 2010
DOI: 10.1109/mascots.2010.43
|View full text |Cite
|
Sign up to set email alerts
|

Barra: A Parallel Functional Simulator for GPGPU

Abstract: We present Barra, a simulator of Graphics Processing Units (GPU) tuned for general purpose processing (GPGPU). It is based on the UNISIM framework and it simulates the native instruction set of the Tesla architecture at the functional level. The inputs are CUDA executables produced by NVIDIA tools. No alterations are needed to perform simulations. As it uses parallelism, Barra generates detailed statistics on executions in about the time needed by CUDA to operate in emulation mode. We use it to explore the mic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
46
0

Year Published

2011
2011
2016
2016

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 84 publications
(46 citation statements)
references
References 19 publications
0
46
0
Order By: Relevance
“…We develop a memory transaction simulator to compute the number of transactions at the hardware level. We use the functional simulator Barra [6] to generate the dynamic program execution information on how many times each instruction is executed. Then we use this information to generate the number of dynamic instructions of each type, the number of shared memory transactions, the number of global memory transactions, and the number of stages divided by synchronization barriers.…”
Section: Performance Modeling and Analysis Methodologymentioning
confidence: 99%
See 4 more Smart Citations
“…We develop a memory transaction simulator to compute the number of transactions at the hardware level. We use the functional simulator Barra [6] to generate the dynamic program execution information on how many times each instruction is executed. Then we use this information to generate the number of dynamic instructions of each type, the number of shared memory transactions, the number of global memory transactions, and the number of stages divided by synchronization barriers.…”
Section: Performance Modeling and Analysis Methodologymentioning
confidence: 99%
“…Since the instruction set of native machine code is not publicly documented, we use the disassembler Decuda developed by van der Laan [16], on which Barra [6] is based as well. With the assistance of Decuda, we build a tool to modify the original binary instructions, assemble the modified instructions back to the binary code sequence, and finally embed the modified code into the execution file.…”
Section: Performance Modelingmentioning
confidence: 99%
See 3 more Smart Citations