Proceedings of the 2019 International Symposium on Symbolic and Algebraic Computation 2019
DOI: 10.1145/3326229.3326273
|View full text |Cite
|
Sign up to set email alerts
|

Big Prime Field FFT on Multi-core Processors

Abstract: We report on a multi-threaded implementation of Fast Fourier Transforms over generalized Fermat prime fields. This work extends a previous study realized on graphics processing units to multi-core processors. In this new context, we overcome the less fine control of hardware resources by successively using FFT in support of the multiplication in those fields. We obtain favorable speedup factors (up to 6.9x on a 6-core, 12 threads node, and 4.3x on a 4-core, 8 threads node) of our parallel implementation compar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 22 publications
(35 reference statements)
0
3
0
Order By: Relevance
“…The library is mainly written in C for performance, with a C++ wrapper interface for portability, object-oriented programming, and end-user usability. Parallelism is already employed in BPAS in its implementations of real root isolation [7], dense polynomial arithmetic [5], and FFT-based arithmetic for prime fields [8]. These implementations make use of the Cilk extension of C/C++ for parallelism.…”
Section: Methodsmentioning
confidence: 99%
“…The library is mainly written in C for performance, with a C++ wrapper interface for portability, object-oriented programming, and end-user usability. Parallelism is already employed in BPAS in its implementations of real root isolation [7], dense polynomial arithmetic [5], and FFT-based arithmetic for prime fields [8]. These implementations make use of the Cilk extension of C/C++ for parallelism.…”
Section: Methodsmentioning
confidence: 99%
“…For certain operations, like dense matrix multiplication following the algorithm of [27], an implementation in Cilk can reach linear speedup on multicore processors, for sufficiently large input. For others like FFT-based dense polynomial multiplication [15,20,42], speedup factors like 9× on 12 cores can be reached for sufficiently large input. The ratio between work and memory accesses for FFTbased dense polynomial multiplication is much smaller than that for dense matrix multiplication.…”
Section: Fork-joinmentioning
confidence: 99%
“…The library is mainly written in the C language, for high-performance, with a simplified C++ interface for end-user usability and object-oriented programming. The BPAS library also makes use of parallelization (e.g., via the CILK extension [14]) for added performance on multi-core architectures, such as in dense polynomial arithmetic [15,16] and arithmetic for big prime fields based on Fast Fourier Transform (FFT) [17]. Despite these previous achievements, the work presented here is in active development and not yet been parallelized.…”
Section: Introductionmentioning
confidence: 99%