2021
DOI: 10.1145/3459010
|View full text |Cite
|
Sign up to set email alerts
|

KernelFaRer

Abstract: Well-crafted libraries deliver much higher performance than code generated by sophisticated application programmers using advanced optimizing compilers. When a code pattern for which a well-tuned library implementation exists is found in the source code of an application, the highest performing solution is to replace the pattern with a call to the library. Idiom-recognition solutions in the past either required pattern matching machinery that was outside of the compilation framework or provided a very brittle … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 13 publications
(11 citation statements)
references
References 17 publications
0
10
1
Order By: Relevance
“…Thus, the performance results for Polly cannot be included in the comparison at this time. As reported by Carvalho et al 16 , we also failed to reproduce the performance results reported by Gareev et al's recent work 19 . Enabling Polly's GEMM idiom recognition and optimization pass 10 the code runs 4.8× slower than Tiling+Packing.…”
Section: Performance Comparison Against Other Compiler-only Approachescontrasting
confidence: 53%
See 4 more Smart Citations
“…Thus, the performance results for Polly cannot be included in the comparison at this time. As reported by Carvalho et al 16 , we also failed to reproduce the performance results reported by Gareev et al's recent work 19 . Enabling Polly's GEMM idiom recognition and optimization pass 10 the code runs 4.8× slower than Tiling+Packing.…”
Section: Performance Comparison Against Other Compiler-only Approachescontrasting
confidence: 53%
“…Therefore, for f64 the value of nr should be reduced in half to reflect the number of VSRs available. With this reduction, an ATile tile occupies 16 VSRs and a BTile tile also occupies 16 VSRs. The extraction of operands into vector registers in lines 6 and 9 of Algorithm 2 must be changed accordingly.…”
Section: Other Data Typesmentioning
confidence: 99%
See 3 more Smart Citations