2023
DOI: 10.1017/s0962492922000101
|View full text |Cite
|
Sign up to set email alerts
|

Floating-point arithmetic

Abstract: Floating-point numbers have an intuitive meaning when it comes to physics-based numerical computations, and they have thus become the most common way of approximating real numbers in computers. The IEEE-754 Standard has played a large part in making floating-point arithmetic ubiquitous today, by specifying its semantics in a strict yet useful way as early as 1985. In particular, floating-point operations should be performed as if their results were first computed with an infinite precision and then rounded to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 149 publications
0
4
0
Order By: Relevance
“…Since c is between 1 and 2, the bound ( 8) is always less than the bound (7). Hence, the bound (7) holds in all cases. We immediately deduce…”
Section: The General Case Ulp(ĉx) Ulp(cx)mentioning
confidence: 96%
See 1 more Smart Citation
“…Since c is between 1 and 2, the bound ( 8) is always less than the bound (7). Hence, the bound (7) holds in all cases. We immediately deduce…”
Section: The General Case Ulp(ĉx) Ulp(cx)mentioning
confidence: 96%
“…It is wiser to measure errors in terms of ulps of the exact result instead of ulps of the computed result, because the latter choice could lead to dubious conclusions. The authors of [7,Section 2.5] illustrate this as follows:…”
Section: Introductionmentioning
confidence: 99%
“…For some structures, exponentials e σj |Sj | can be huge. To bypass computer arithmetic problems [18], we introduced bounded matrices M j = M j e −σj |Sj | and explored them to construct the characteristic function. An example of calculated vertical mode, the corresponding function n 0 (z) and n eff , is given in Fig.…”
Section: B Numerical Algorithmsmentioning
confidence: 99%
“…( 8). Notably, analytic formulas representing G r,s and related double integrals G (k,j) (r,s) for large |r| and |s| can imply floatingnumber-arithmetic-related problems [18] since we must handle very large and small exponentials e ± √ r 2 +s 2 β0z . By treating large and small exponentials separately (i.e., replacing possibly huge matrices M p,j with bounded matrices M p,j ), avoiding division of very large and small numbers, and accounting for further computer-arithmetic problems (such as ε + 1 − 1 ≡ 0 whereas ε + (1 − 1) ≡ ε for |ε| < 10 −16 ), we could use otherwise unavailable large values of D: see, e.g., Fig.…”
Section: B Numerical Algorithmsmentioning
confidence: 99%