Just storing the Hessian (the matrix of second derivatives ∂ 2 E¡ ∂w i ∂w j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like is to compute its product with various vectors, we derive a technique that directly calculates
£ ¢, which takes about as much computation, and is about as local, as a gradient evaluation. We then apply the technique to a one pass gradient calculation algorithm (backpropagation), a relaxation gradient calculation algorithm (recurrent backpropagation), and two stochastic gradient calculation algorithms (Boltzmann Machines and weight perturbation). Finally, we show that this technique can be used at the heart of many iterative techniques for computing various properties of , obviating any need to calculate the full Hessian.