2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH) 2016
DOI: 10.1109/arith.2016.26
|View full text |Cite
|
Sign up to set email alerts
|

Quad Precision Floating Point on the IBM z13

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(11 citation statements)
references
References 12 publications
0
11
0
Order By: Relevance
“…Table 6 compares the latency and total time of the proposed division unit with these of classic processors for FP SP and DP with normalized operands and result, Intel Penryn [59], IBM zSeries [60], IBM z13 [61], HAL Sparc [62], AMD K7 [63], AMD Jaguar [64].…”
Section: Related Work and Comparisonsmentioning
confidence: 99%
See 1 more Smart Citation
“…Table 6 compares the latency and total time of the proposed division unit with these of classic processors for FP SP and DP with normalized operands and result, Intel Penryn [59], IBM zSeries [60], IBM z13 [61], HAL Sparc [62], AMD K7 [63], AMD Jaguar [64].…”
Section: Related Work and Comparisonsmentioning
confidence: 99%
“…Consequently, the latency is almost halved with respect to that of the radix-4 unit. The IBM z13 processor [61] has a divide unit supporting SP, DP, QP, and all the hexadecimal FP data types. The underlying algorithm is a radix-8 division generating 3 bits per cycle.…”
Section: Related Work and Comparisonsmentioning
confidence: 99%
“…Half precision was defined for storage only, but several manufacturers now support it for computation. Quadruple precision is available only in software, with the exception of the IBM z13 mainframe systems, designed for business analytics workloads [8]. Another form of half precision called bfloat16 was introduced by Google on its Tensor Processing Unit and will be supported by Intel in its forthcoming Nervana Neural Network Processor and Cooper Lake processor and on the Armv8-A architecture [9], [10], [11], [12], [13].…”
Section: Mixed Precision Algorithmsmentioning
confidence: 99%
“…In each iteration cycle, the partial square-root digits with fixed bit-width can be obtained. At present, the most widely used digital recursive algorithm is SRT, in Intel or IBM [7,8] processor cores, the SRT algorithm with lower radix is used to implemented square-root circuit. In the standard SRT algorithm, although the higher radix can improve the computational performance, the area cost of the lookup table increases in quadratic with the radix [9].…”
Section: Introductionmentioning
confidence: 99%