“…Section 4 explains how a more general form of inference called Expectation Propagation(EP), which can be viewed as a form of expectation-maximisation, can be used for both belief tracking and parameter optimisation [5,6]. Section 5 explains how natural actor-critic reinforcement learning can be used to optimise the policy parameters P [7,8], and how with a simple extension, it can also be used to optimise the dialogue model parameters M [9]. Finally, section 6 addresses the problem of fast on-line policy optimisation using Gaussian processes as a non-parametric policy model [10,11,12].…”