A Single (Unified) Shader GPU Microarchitecture for Embedded Systems

Moya, Victor; Gonzalez, C. Diez; Roca, Josep; Fernandez, A.; Espasa, Roger

doi:10.1007/11587514_19

Cited by 8 publications

(3 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…18.5% of the LUTs were used for a system wrapper, a scalar FP unit from the Xilinx library and the SIMD extension interface. We selected a set of linear algebra benchmarks that would be ideal candidates for vectorization in embedded applications such as media processing [16], sensor array data processing, global positioning systems and beamforming solutions [14]. Several vector and matrix benchmarks are provided in Eigen.…”

Section: Resultsmentioning

confidence: 99%

Co-synthesis of FPGA-based application-specific floating point simd accelerators

Hagiescu

Wong

2011

Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays

View full text Add to dashboard Cite

The constant push for feature richness in mobile and embedded devices has significantly increased computational demand. However, stringent energy constraints typically remain in place. Embedding processor cores in FPGAs offers a path to having customized instruction processors that can meet the performance and energy demands. Ideally, the customization process should be automated to reduce the design effort, and indirectly the time to market. However, the automatic generation of custom extensions for floating point computation remains a challenge in FPGA codesign. We propose an approach for accelerating such computation via application-specific SIMD extensions. We describe an automated co-design toolchain that generates code and application-specific platform extensions that implement SIMD instructions with a parameterizable number of vector elements. The parallelism exposed by encapsulating computation in vector instructions is matched to an adjustable pool of execution units. Experiments on actual hardware show significant performance improvements. Our framework provides an important extension to the capabilities of embedded processor FPGAs which traditionally dealt with bit, integer, and low intensity floating point code, to now being able to handle vectorizable floating point computation.

show abstract

Section: Resultsmentioning

confidence: 99%

Co-synthesis of FPGA-based application-specific floating point simd accelerators

Hagiescu

Wong

2011

Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays

View full text Add to dashboard Cite

show abstract

“…In [1] we tested different GPU configurations (unified and non-unified architectures) from a low end GPU to a high end future GPU (current GPUs implement at most 4 or 6 quad fragment shading units). In [2] we inverted the scaling direction and downgraded the simulated architecture to the most basic embedded GPU, configured with a single fragment shader unit doing all the vertex, fragment and triangle shading work.…”

Section: Detailed Attila Gpu Pipelinementioning

confidence: 99%

“…games) like UT2004 and Doom3 in our simulator. Our microarchitecture and simulator are versatile and highly configurable and can be used to evaluate multiple configurations: high-end PC GPUs [1] to embedded GPUs [2] for mobile systems.…”

Section: Introductionmentioning

confidence: 99%