2010
DOI: 10.1090/s0025-5718-2010-02367-1
|View full text |Cite
|
Sign up to set email alerts
|

A multimodular algorithm for computing Bernoulli numbers

Abstract: Abstract. We describe an algorithm for computing Bernoulli numbers. Using a parallel implementation, we have computed B k for k = 10 8 , a new record. Our method is to compute B k modulo p for many small primes p and then reconstruct B k via the Chinese Remainder Theorem. The asymptotic time complexity is O(k 2 log 2+ε k), matching that of existing algorithms that exploit the relationship between B k and the Riemann zeta function. Our implementation is significantly faster than several existing implementations… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
40
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 22 publications
(44 citation statements)
references
References 14 publications
0
40
0
Order By: Relevance
“…It is possible to compute the Bernoulli numbers in parallel [12]. However, such parallelization is not done in Mathematica, and it has not been done in the computer experiments to be described in the examples in Section 8 -mainly because, as was noted, except for a very narrow set of functions f , the time needed to compute the Bernoulli numbers B 2 , .…”
Section: Parallelization and Memorymentioning
confidence: 99%
See 3 more Smart Citations
“…It is possible to compute the Bernoulli numbers in parallel [12]. However, such parallelization is not done in Mathematica, and it has not been done in the computer experiments to be described in the examples in Section 8 -mainly because, as was noted, except for a very narrow set of functions f , the time needed to compute the Bernoulli numbers B 2 , .…”
Section: Parallelization and Memorymentioning
confidence: 99%
“…The usually very large amount of time needed to compute those derivatives certainly precludes values of m > 1 2 10 4 . On the other hand, for values of m ≤ 1 2 10 4 , the only execution time reported in [12] is for B 2m = B 10 4 , with m = 1 2 10 4 -only for a one-core calculation, which took 0.25 sec (on a 16-core 2.6 GHz AMD Opteron (64-bit) machine with 96 GB RAM, running Ubuntu Linux). In comparison, it took Mathematica just about 0.05 sec to compute the same number, B 10 4 (on a roughtly comparable 12-core 2.30 GHz Intel Xeon (64-bit) machine with 128 GB RAM, running Windows 7).…”
Section: Parallelization and Memorymentioning
confidence: 99%
See 2 more Smart Citations
“…We use truncated Fourier transforms [30] to avoid power-of-two jumps in the running time; that is, taking N to be a suitably large power of two, instead of evaluating at all N -th roots of unity, we evaluate only on a subset of those roots large enough to determine the polynomial product of interest. We aggressively use array decompositions of FFTs, adapted to the truncated case [9], to improve locality. Finally, we use OpenMP throughout, including within the FFTs themselves, to take advantage of multiple processor cores.…”
Section: Horizontal Dfts -The Umbrella Algorithmmentioning
confidence: 99%