“…We adopt the concepts of determinization (Somani et al, 2013) and common random numbers (Owen, 2013), a popular variance reduction technique: To sample states and observations for episodes on level l , we use a deterministic simulative model,that is, a function fl:S×A×[0,1]↦S×O such that, given a random variable ψ uniformly distributed in [0,1], ( s ’, o ) = f l ( s , a , ψ ) is distributed according to T l ( s , a , s ’) O ( s ’, a , o ). For an initial state s 0 ∼ b t and a sequence of actions, the states and observations of an episode on level l are then deterministically generated from f l using a sequence Ψ = ( ψ 0 , ψ 1 , …) of iid.…”