“…, applying union bound over all (s, a, s ) ∈ S × A × S, and observing that Z i ∼ P (s, a, s ) is a Bernoulli random variable with empirical variance Vn = P (s, a, s )(1 − P (s, a, s )) yields the result: {∀s, a, s ∈ S × A × S, ∀n > 1 : |P (s, a, s ) − P (s, a, s )| ≤ ψ sas (n)} holds with prob 1 − δ Observing that ψ sas (n) ≤ ψ(n) for all n > 1 because ψ sas (n) takes on a maximum when P (s, a, s ) = 1 2 , completes the proof. Lemma B.2 (Inverting E).…”