Big Prime Field FFT on Multi-core Processors

Covanov, Svyatoslav; Mohajerani, Davood; Maza, Marc Moreno; Wang, Linxiao

doi:10.1145/3326229.3326273

Cited by 6 publications

(3 citation statements)

References 22 publications

(35 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The library is mainly written in C for performance, with a C++ wrapper interface for portability, object-oriented programming, and end-user usability. Parallelism is already employed in BPAS in its implementations of real root isolation [7], dense polynomial arithmetic [5], and FFT-based arithmetic for prime fields [8]. These implementations make use of the Cilk extension of C/C++ for parallelism.…”

Section: Methodsmentioning

confidence: 99%

On the Parallelization of Triangular Decomposition of Polynomial Systems

Asadi¹,

Brandt²,

Moir³

et al. 2019

Preprint

View full text Add to dashboard Cite

We discuss the parallelization of algorithms for solving polynomial systems symbolically by way of triangular decomposition. Algorithms for solving polynomial systems combine low-level routines for performing arithmetic operations on polynomials and high-level procedures which produce the different components (points, curves, surfaces) of the solution set. The latter "component-level" parallelization of triangular decompositions, our focus here, belongs to the class of dynamic irregular parallel applications. Possible speedup factors depend on geometrical properties of the solution set (number of components, their dimensions and degrees); these algorithms do not scale with the number of processors. In this paper we combine two different concurrency schemes, the fork-join model and producer-consumer patterns, to better capture opportunities for component-level parallelization. We report on our implementation with the publicly available BPAS library. Our experimentation with 340 systems yields promising results.

show abstract

Section: Methodsmentioning

confidence: 99%

On the Parallelization of Triangular Decomposition of Polynomial Systems

Asadi¹,

Brandt²,

Moir³

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…For certain operations, like dense matrix multiplication following the algorithm of [27], an implementation in Cilk can reach linear speedup on multicore processors, for sufficiently large input. For others like FFT-based dense polynomial multiplication [15,20,42], speedup factors like 9× on 12 cores can be reached for sufficiently large input. The ratio between work and memory accesses for FFTbased dense polynomial multiplication is much smaller than that for dense matrix multiplication.…”

Section: Fork-joinmentioning

confidence: 99%

Design and Implementation of Multi-Threaded Algorithms in Polynomial Algebra

Maza

2021

Proceedings of the 2021 International Symposium on Symbolic and Algebraic Computation

Self Cite

View full text Add to dashboard Cite

“…The library is mainly written in the C language, for high-performance, with a simplified C++ interface for end-user usability and object-oriented programming. The BPAS library also makes use of parallelization (e.g., via the CILK extension [14]) for added performance on multi-core architectures, such as in dense polynomial arithmetic [15,16] and arithmetic for big prime fields based on Fast Fourier Transform (FFT) [17]. Despite these previous achievements, the work presented here is in active development and not yet been parallelized.…”

Section: Introductionmentioning

confidence: 99%

Algorithms and Data Structures for Sparse Polynomial Arithmetic

et al. 2019

Self Cite

View full text Add to dashboard Cite

We provide a comprehensive presentation of algorithms, data structures, and implementation techniques for high-performance sparse multivariate polynomial arithmetic over the integers and rational numbers as implemented in the freely available Basic Polynomial Algebra Subprograms (BPAS) library. We report on an algorithm for sparse pseudo-division, based on the algorithms for division with remainder, multiplication, and addition, which are also examined herein. The pseudo-division and division with remainder operations are extended to multi-divisor pseudo-division and normal form algorithms, respectively, where the divisor set is assumed to form a triangular set. Our operations make use of two data structures for sparse distributed polynomials and sparse recursively viewed polynomials, with a keen focus on locality and memory usage for optimized performance on modern memory hierarchies. Experimentation shows that these new implementations compare favorably against competing implementations, performing between a factor of 3 better (for multiplication over the integers) to more than 4 orders of magnitude better (for pseudo-division with respect to a triangular set).

show abstract

Big Prime Field FFT on Multi-core Processors

Cited by 6 publications

References 22 publications

On the Parallelization of Triangular Decomposition of Polynomial Systems

On the Parallelization of Triangular Decomposition of Polynomial Systems

Design and Implementation of Multi-Threaded Algorithms in Polynomial Algebra

Algorithms and Data Structures for Sparse Polynomial Arithmetic

Contact Info

Product

Resources

About