“…A number of methods augment the short-and long-term memory internal to recurrent networks with external "working" memory, in order to realize differentiable programming architectures that can learn to model and execute various programs , Sukhbaatar et al, 2015, Joulin and Mikolov, 2015, Reed and de Freitas, 2015, Grefenstette et al, 2015, Kurach et al, 2015. Unlike our approach, these methods explicitly decouple memory from computation, mimicking a standard computer architecture.…”