“…Ballard, Demmel, Holtz, Lipshitz, and Schwartz [15], Ballard, Demmel, Holtz, and Schwartz [16], and Lipshitz, Ballard, Demmel, and Schwartz [77]. Recent engineering work includes Benson and Ballard [23] and Huang, Rice, Matthews, and van de Geijn [65]. Our work differs from these works in that we seek a self-contained proof-of-concept demonstration requiring good-performance finite-field matrix multiplication on GPUs, but we do not necessarily seek the most optimized possible implementation; such optimizations are left for future work.…”